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Abstract 

We define a doubly infinite, monotone labeling of Bienayme-Galton-Watson (BGW) ge- 
nealogies. The genealogy of the current generation backwards in time is uniquely deter- 
mined by the coalescent point process (A4; i > 1), where A t is the coalescence time between 
individuals i and i + 1. There is a Markov process of point measures (B^i > 1) keeping 
track of more ancestral relationships, such that is also the hrst point mass of B^. 

This process of point measures is also closely related to an inhomogeneous spine decom- 
position of the lineage of the first surviving particle in generation ft, in a planar BGW tree 
conditioned to survive h generations. The decomposition involves a point measure p storing 
the number of subtrees on the right-hand side of the spine. Under appropriate conditions, 
we prove convergence of this point measure to a point measure on R+ associated with the 
limiting continuous-state branching (CSB) process. We prove the associated invariance 
principle for the coalescent point process, after we discretize the limiting CSB population 
by considering only points with coalescence times greater than e. 

The limiting coalescent point process (Bf; i > 1) is the sequence of depths greater than 
e of the excursions of the height process below some fixed level. In the diffusion case, there 
are no multiple ancestries and (it is known that) the coalescent point process is a Poisson 
point process with an explicit intensity measure. We prove that in the general case the 
coalescent process with multiplicities i > 1) is a Markov chain of point masses and we 
give an explicit formula for its transition function. 

The paper ends with two applications in the discrete case. Our results show that the 
sequence of Aj's are i.i.d. when the offspring distribution is linear fractional. Also, the law 
of Yaglom's quasistationary population size for subcritical BGW processes is disintegrated 
with respect to the time to most recent common ancestor of the whole population. 

Running head. The coalescent point process of branching trees. 
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1 Introduction 

The idea of describing the backward genealogy of a population is ubiquitous in population 
genetics. The most popular piece of work on the subject is certainly [18], where the standard 
coalescent is introduced, and shown to describe the genealogy of a finite sample from a pop- 
ulation with large but constant population size. Coalescent processes for branching processes 
cannot be characterized in the same way, since, for example, they are not generally Markov 
as time goes backwards, although for stable CSB processes the genealogy can be seen as the 
time-change of a Markovian coalescent [6] . 

The present paper relies on the initial works [IJ[28], which focused on the coalescent point 
process for critical birth-death processes and the limiting Feller diffusion. They have been 
extended to non-critical birth-death processes |14| and more generally to homogeneous, binary 
Crump-Mode- J agers processes |24| . In all these references, simultaneous births were not al- 
lowed, since then the genealogical process would have to be keep memory of the multiplicity 
of all common offspring of an ancestor. The problem of sampling a branching population or a 
coalescent point process has received some attention [20\ [23j [30] , but no consistent sampling in 
the standard coalescent has been given so far (except Bernoulli sampling of leaves). Our goal 
was to define a coalescent point process for arbitrary branching processes, that is both simple 
to describe in terms of its law, and allows for finite sampling in a consistent way. That is, the 
coalescent process for samples of size n > 1 are all embedded in the same object. 

A different way of characterizing the genealogy of a branching population alive at some 
fixed time is with a reduced tree, first studied in |10] and generalized in [HJ Section 2.7] to CSB 
processes. To construct it, a starting date and a finishing date T are specified, and a reduced 
branching tree started at and conditioned to be alive at T is defined by erasing the points 
without alive descendants at time T. Instead of directly displaying the coalescence times as a 
sequence running over the current population size (as is our goal), this approach characterizes 
the transition probabilities of the reduced branching process by tracking the population size 
in time with an inhomogeneous Markov process on [0,T] taking values in the set of integers 
for all times in [0,T). Unfortunately, this construction does not allow for a consistent way of 
sampling the individuals alive at time T. 
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We use a different approach and construct a coalescent process with multiplicities for the 
genealogy of some random population, when the forward time genealogy is produced by a 
general branching process, either discrete or continuous-state. Our main goal is to give a 
simple representation for this process, and describe its law in a manner that would be easy to 
use in applications. 

We are interested in having an arbitrarily large population at present time, that we think 
of as generation 0, arising from a general branching process, originating at an unspecified 
arbitrarily large time in the past. In order to keep track of genealogies of individuals in the 
present population, for discrete Bienayme-Galton- Watson (BGW) branching processes we use 
a representation that labels individuals at each generation in such a way that lines of descent 
of the population at any time do not intersect (see Fig. [I]). This leads to a monotone planar 
embedding of BGW trees such that each individual is represented by (n, i) where n G Z is 
the generation number and i G N is the individual's index in the generation. We consider 
BGW trees which are doubly infinite, allowing the number of individuals alive at present time 
to be arbitrarily large, and considering an infinite number of generations for their ancestry 
back in time. This monotone representation can also be extended to the case of continuous- 
state branching processes (later called CSB processes), in terms of a linearly ordered planar 
embedding of the associated genealogical R-trees. This is in some way implicit in many recent 
works: our embedding is the exact discrete analogue of the flow of subordinators defined in [3]; 
also, in the genealogy of CSB processes defined in [26] . the labeling of individuals at a fixed 
generation is analogous to the successive visits of the height process at a given level. 

From our discrete embedding of the population, a natural consequence is that coalescence 
times, or times backwards to the most recent common ancestor, between individuals k and I 
in generation 0, are represented by the maximum of the values Ai,i = k, ...,£ — 1, where A{ is 
the coalescence time between individuals i and i + The genealogy back in time of the present 
population is then uniquely determined by the process (Ai 4 ,i > 1), which we call the branch 
lengths of the coalescent point process. In the continuous-state branching process setting, the 
A^s are given by depths of the excursions of the height process below the level representing 
the height of individuals in the present population. 

In general, the process (Af,i > 1) is not Markovian and its law is difficult to characterize. 
Our key strategy is to keep track of multiple ancestries to get a Markov process, thus con- 
structing a coalescent point process with multiplicities (B^i > 1). Intuitively, for each i > 1 
Bi encodes the relationship of the individual i + 1 to the infinite spine of the first present in- 
dividual, by recording the nested sequence of subtrees that form that ancestral lineage linking 
i + 1 to the spine. The value of Bi is a point mass measure, where each point mass encodes 
one of these nested subtrees: the level of the point mass records the level back into the past 
at which this subtree originated, while the multiplicity of the point mass records the number 
of subtrees with descendants in individuals {i + l,i + 2, . . .} emanating at that level which are 
embeded on the right hand side of the ancestral link of i + 1 to the spine. 

Formally, for both BGW and the CSB population, we proceed as follows. We define a 
process with values in the integer-valued measures on N or M + (BGW or CSB, respectively) 
in a recursive manner, in such a way that (Bj; 1 < j < i) will give a complete record of the 
ancestral relationship of individuals 1 < j < i. Start with the left most individual in the 
present population, i = 1, and let B\ have a point mass at n, where — n is the generation of 
the last common ancestor of individuals 1 and 2, and have multiplicity B\({n}) equal to the 
number of times this ancestor will also appear as the last common ancestor for individuals 
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ahead, with index i > 1. We then proceed recursively, so that at individual i we take the point 
masses in 5j— 1> we first make an adjustment in order to reflect the change in multiplicities due 
to the fact that the lineage of individual i is no longer considered to be an individual ahead 
of the current one. We then add a record for the last common ancestor of individuals i and 
i + l.If this is an ancestor that has already been recorded in Bi_\ we just let Bi be the updated 
value of Bi-i. Otherwise, we let Bi be the updated -Bj_i plus a new point mass at n, if — n is 
the generation of this new last common ancestor, whose multiplicity Bi({n}) is the number of 
times this new ancestor will appear as the last common ancestor for individuals ahead, with 
index > i + 1. (for example in Fig. Q] we will have B\ = 25\ , B2 = 5\ , B 3 = 82 , -B4 = 5i , -B5 = 
<5 6 , B & = 5 l ,B 7 = 25 3 , B 8 = 5 X + 5 3 , B 9 = S 3 ). 

Once we have constructed the coalescent point process with multiplicities (in either the 
BGW or CSB case), we will show that A{ can be recovered as the location of the non-zero 
point mass in Bi with the smallest level, that is, Ai = inf{n : Bi({n}) ^ 0} (for example in 
Fig. Q] we have A 1 = 1,A 2 = l,A 3 = 2,A 4 = l,A 5 = 6,A 6 = 1,A 7 = 3,A S = 1,A 9 = 3). 
More importantly, we prove that (Bi]i > 1) is a Markov process, and that, when going from 
individual i to % + 1, the transitions decrease the multiplicity of the point mass of Bi at level 
inf{n : Bi({n}) 7^ 0} by 1, and that with a specified probability a new random point mass is 
added at a random level that must be smaller than the smallest non-zero level in the updated 
version of Bi. 

In order to use this construction for both BGW and the CSB population our take on what 
constitutes the sequence of present individuals has to be different for the continuous CSB 
population from the simple one for the discrete BGW population. Since in the CSB case the 
present population size is not discrete, and there is an accumulation of immediate ancestors 
at times arbitrarily close to the present time, we have to discretize the present population by 
considering only the individuals whose last common ancestor occurs at a time at least an e 
amount below the present time, for an arbitrary e > 0. We will later show that, we can obtain 
the law of the coalescent point process with multiplicities for the CSB population as a limit of 
a sequence of appropriately rescaled coalescent point processes with multiplicities for the BGW 
population for which we also use the same discretization process of the present population. At 
first it may seem surprising that these coalescents with multiplicities are Markov processes over 
the set of all or the e-discretized set (BGW and CSB case, respectively) individuals at present 
time. Below we intuitively explain why this is the case by describing the two approaches for 
constructing them. 

In the discrete case, we start by giving a related, easier to define, process (Di]i > 1), taking 
values in the integer- valued sequences, whose first nonzero term is also at level Ai. For each i, 
the sequence (Di(n),n > 1) gives the number of younger offshoots at generation — n embedded 
on the right-hand side of the ancestor of i. The trees sprouting from the younger offshoots are 
independent, and the law of a tree sprouting from a younger offshoot at generation —n has 
the law of the BGW tree conditioned to survive at least n — 1 generations. It turns out that 
{Di\ i > 1) is Markov, and we construct (Bi; i > 1) from it, show that it is Markov as well, and 
give its transition law. In order to be able to deal with the conditioning of younger offshoots in 
a way that allows us later to pass to the limit, we introduce an integer- valued measure p that 
takes the ancestor of individual 1 at generation —n and records as p({n}) = p n the number 
of all of its younger offshoots embedded on the right-hand side, and we call it the great-aunt 
measure. This measure gives a spine decomposition of the first survivor (individual i = 1), in 
such a way that the law of the trees sprouting from the younger offshoots are still independent, 



4 



but are no longer conditioned on survival. 

In the continuous-state case, the great-aunt measure will be a measure p° on M + , and we 
will be able to define the genealogy thanks to independent CSB processes starting from the 
masses of the atoms of p°. In the subcritical and critical cases, this can be done using a single 
path of a Levy process with no negative jumps and Laplace exponent ip. In the supercritical 
case a concatenation of excursion paths will have to be used. Using the continuous great-aunt 
measure p°, we characterize the genealogy of an infinite CSB tree with branching mechanism ^ 
via the height function H* whose value at an individual in the population can be decomposed 
into the level on the infinite spine at which the subtree containing this individual branches off, 
and the relative height of this individual within its subtree. We construct (Bf; i > 1) from the 
height process H*. Discretizing the population by considering only the points whose coalescence 
times are greater than some fixed e > 0, translates into considering only the excursions of H* 
from level with a depth greater than —e. From these excursions we can obtain the process 
((Af,Nf);i > 1) where A\ is the depth of the z-th such excursion and Nf is the number of 
future excursions with the exact same depth. It turns out that a specific functional of this 
process is Markov, and we construct (Bf;i > 1) from it, show that it is Markov as well, and 
give its transition law. 

Now, in order to prove convergence in law of the discretized version of (-Bjj i > 1) to (Bf; i > 
1), we make use of the great-aunt measures p and p° describing the spine decomposition of the 
first surviving individual. In the discrete case, the spine decomposition (p({n});0 < n < h) 
truncated at level h has the same law as the spine decomposition of a planar embedding of a 
BGW tree conditioned on surviving up to generation h, where the first survivor is defined in the 
usual depth- first search order (see Fig. [2]). Using a well-known random walk representation of 
BGW trees [3], we then prove (under usual conditions ensuring the convergence of the random 
walk to a spectrally positive Levy process X with Laplace exponent ip) the convergence (in the 
vague topology) of the great-aunt measure p to the measure p° on R + defined by 

p°(dx) := Pdx + ^2 &t$t(dx), 

t:A t >0 

where (3 is the Gaussian coefficient of X, and (t, At) is a Poisson point measure with intensity 

( e v ® r [ e~ vi - t)z A(dz) ) dtdr r > 0, 

where A is the Levy measure of X, and v the inverse of the decreasing function A i— > Jj A +oo ^ du/ip(u). 

This is exactly the same measure p° we obtain from the spine decomposition in the continuous 
case. Since the transition law of (Bi;i > 1) can be represented as a functional of p and the 
transition law of (Bf; i > 1) as a functional of p°, this will lead to our claim. 

Finally, in the very last section, we give two simple applications of our results in the discrete 
case. First, we prove that in the linear fractional case, the coalescent point process is a sequence 
of i.i.d. random variables. Related results can be found in [29]. Second, in the subcritical case, 
we use the monotone embedding to display the law, in quasi-stationary state, of the total 
population size (Yaglom's limit) jointly with the time to most recent common ancestor of this 
whole population. 
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2 Doubly infinite embedding and the coalescent point process 



2.1 The discrete model 

We will start off with a monotone planar embedding of an infinitely old BGW process with 
arbitrarily large population size. Let (£(ra,i)) be a doubly indexed sequence of integers, where 
n is indexed by Z = {..., 0, —1, —2, —3, . . . } and i is indexed by N = {1, 2, 3, . . . }. The index 
pair (n, i) represents the i-th individual in generation n, and £(n, i) provides the number of 
offspring of this individual. 



1 2 3 4 5 6 7 8 9 10 11 ••• 1 23456 7 8 9 10 11 - 




Figure 1: A doubly infinite embedding of quasi-stationary Bienayme-Galton- Watson ge- 
nealogies. Horizontal labels are individuals and vertical labels are generations. Left panel: 
the complete embedding of an infinite BGW tree. Right panel: coalescence times between 
consecutive individuals of generation 0. Here, the coalescent point process takes the value 
(1,1,2, 1,6, 1,3, 1,3,---)- 



We endow the populations with the following genealogy. Individual (n, i) has mother (n— 1, j) 
if 

J-l 3 

££(n-l,fc) <*<££(n-l,fc). 

k=l k=l 

From now on, we focus on the ancestry of the population at time 0, that we will call standing 
population, and we let di(n) denote the index of the ancestor of individual (0, i) in generation 
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—n. In particular, Oj(l) := min{j > 1 : ^I=i 1> — *}• Our main goal is to describe the 
law of the times of coalescence Cjj of individuals (0,i) and (0,j), that is, 

Cij := min{n > 1 : Oj(n) = cij(n)}, 



where it is understood that min 



+oo. Defining 



A :— Ci^+i 



it is easily seen that by construction, for any i < j, Cij = max{Aj, A+ii . . . , A_i}. Thus, 
the sequence Ai, A2, . . . contains all the information about the (unlabelled) genealogy of the 
current population and is called coalescent point process, as in [28] (see Fig. [1]). 

We assume that there is a random variable £ with values in Z+ = {0, 1,2,...} and probabil- 
ity generating function (p.g.f.) /, such that all r.v.'s £(n, i) are i.i.d. and distributed as £. As a 
consequence, if Z^ n,i \k) denotes the number of descendants of (n, i) at generation n + k, then 
the processes (Z^ n ' l \k);k > 0) are identically distributed Bienayme-Galton- Watson (BGW) 
processes starting from 1. We will refer to Z as some generic BGW process with this distribu- 
tion and we will denote by f n the n-th iterate of / and by p n = 1 — / n (0) the probability that 
Z n ^ 0. We need some extra notation in this setting. 

We define Q n as the number of individuals at generation 1 having alive descendants at 
generation n. In particular, £ n has the law of 



k=l 

where the are i.i.d. Bernoulli r.v. with success probability p n -i, independent of £. We also 
define Q' n as the r.v. distributed as ( n — 1 conditional on ( n 7^ 0. 

2.2 The coalescent point process: main results 

Recall that Aj denotes the time of coalescence of individuals (0, i) and (0, i + 1), or i-th. branch 
length of the coalescent. To describe the law of (Ajjz > 1) we need an additional process. 

Let Di(n) be the number of daughters of (— ra,cij(n)), distinct from (— n + l,di(n — 1)), 
having descendants in {(0, j); j > i}. In other words, Di(n) is the number of younger surviving 
offshoots of (— n, Oi(n)) not counting the lineage of (0,z) itself. Letting 



We set Dq{u) := for all n. We now provide the law of this process, and its relationship to 
(Ai;i > 1). We also set ^0 := +00, which is in agreement with min0 = +00. 

Theorem 2.1 The i-th branch length is a simple functional of (Di; i > 0) 

Aj = min{n > 1 : A(n) / 0}. 

In addition, the sequence-valued chain [Di;i > 0) is a Markov chain started at the null sequence. 
For any sequence of nonnegative integers (d n ) n >o, the law of Di + \ conditionally given Di(n) = 




T>i(n) := {daughters of (— n, di(n)) with descendants in {(0,j);j > i}} 



we have 



A(n) = #Vi{n) - 1. 
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d n for all n, is given by the following transition. We have Di + \(n) = d n for all n > Ai, 
Di+i(Ai) = d,Ai — 1 and the r.v. 's Di + \{n), 1 < n < A4, are independent, distributed as C' n . In 
particular, the law of A\ is given by 

1 f (0) 

F(Ai >n) = Tlt =i nC'k = 0) = 1 IIL^ (A-i(0)) = 1 _ n )J Q) = = l\Z n ? 0). 

Proof. The following series of equivalences proves that A{ is the level of the first nonzero 
term of the sequence D^. 

Ai> n <=> V/c < n, ai(k) / aj+i(/c) 

<=> Vk <n, Vj > i, Oi(k) / aj(k) 

+> Vk < n, (—k, Oi(k)) has no descendants in {(0, j) : j > i} 

O V/c < n, Di(k) = 0. 

Thanks to this last result, we get 

Oi(Ai) = a i+ i{Ai) and Oi(A - 1) / Oi+i(^i - 1). 

In particular, (— A4 + 1, Oj(j4j — 1)) has no descendants in {(0, j) : j > i + 1}, so that 

V i+1 (Ai) = Vi{Ai) \ {{-A, + 1, ai (Ai - 1))}, 

and #V i+1 (Ai) = #Vi(Ai) - 1, that is, A+l(^i) = A(^i) - 1. 

Let us deal with the case n > Ai. By definition of Ai, cij(n) = Oj-i-i(n) for any n > 
Aj. As a consequence, for any n > Aj, each daughter of (— n,Oj+i(n)) has descendants in 
{(0;j); J > « + 1} iff she has descendants in {(0,j); j > i}. In other words, = T>i(n), 

and = Di(n). 

Now we deal with the case n < Ai. Set 

Ei := {.Dj(n) = dj >n , Vn > 1, j < i}, 

where the (4/',n)n>i,j<j are fixed integer numbers, and let Ai be the value of the coalescence 
time of i and i + 1 conditional on Ei, that is, Ai = min{n > 1 : di n ^ 0}. Now let T(n,i) be 
the tree descending from (— n, Oj(n)) and set 

I(n,i) := minjj < i : aj(n) = a,i(n)}. 

Observe that the (unlabeled) tree T(n, i) has the law of a BGW tree conditioned on having at 
least one descendant at generation 0. Now because the event Ei only concerns the descendants 
of daughters of ancestors of {—Ai + 1, Oi+i(Ai — 1)), the law of T(Ai — 1, i + 1) conditional on 
Ei is still the law of a BGW tree conditioned on having at least one descendant at generation 
0. 

Also notice that conditional on Ei , I(Ai — l,i+l) = i+1. As a consequence, I(n,i+1) = i+1 
for any n < Ai, so that for any n < Ai, conditional on Ei, 

T>i + \{n) := {daughters of (— n, cij(n)) with descendants at generation 0}. 
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The result follows recalling the law of T(Ai — 1, i + 1) conditional on Ei. 
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Recall that Dq is a null sequence, Aq = +00, and that -Di(n) = C'( n ) f° r w > 1. This infinite 
sequence of values in {Dx(n),n > 1} contains information on the ancestral relationship of the 
individual (0, 1) and an arbitrarily large number of individuals in the standing population, 
going back arbitrarily far into the past. Using the process (Df,i > 1) we next define a sequence 
(Bi]i > 1) of finite point mass measures, which contains the minimal amount of information 
needed to reconstruct A\, A2, ■ ■ ■ while remaining Markov. We do this for two reasons. First, if 
we are only interested in the ancestral relationship of finitely many individuals in the standing 
population, there is no need to keep track of an infinite sequence of values. Second, when we 
consider a rescaled limit of the BGW process to a CSB process, we will need to work with a 
sparser representation. 

The main distinction between these two processes is that while D{ contains information on 
the ancestral relationship of (0,i) and (0,j) for both 1 < j < i + 1 and j > i + 1, Bi will 
only contain information about the ancestral relationship of (0, i) and (0, j) for 1 < j < i + 1. 
In other words, if say maxj^i, . . . ,Ai} = n then Bi{m) = for all m > n. Moreover, Bi is 
defined directly from Di by letting Bi({m}) = Di(m) for all m < n. In particular, B\ will have 
a single non-zero point mass at level Ax, and Bx({Ax}) = Dx(Ax). 

We are now ready to define {Bi;i > 1) which we call the coalescent point process with 
muliplicities. Let (Bi; i > 0) be a sequence of finite point measures, started at the null measure, 
defined from {Df, i > 0) recursively as follows. For any point measure b = Yl n >i bn^ n , let s(b) 
denote the minimum of the support of b, that is, 

s(b) := min{n > 1 : b n 7^ 0} 

with the convention that 5(6) = +00 if b is the null measure. If Bi = Yln>i bn,i$n for some 

{frn,i}n>l G N > let 

ax,i ■= s(Bi), B* := Bi — 6 aii , a\ i := s(B*) 

Then, define 

B . = f B* + D i+ x(Ai + x)S Ai+1 if A i+1 < and A i+1 ^ a 14 
4+1 ' \ B* otherwise 

Note that, by Theorem 12.11 we have Di + x{Ai) = Di(Ai) — 1, so by this definition B^x = 
Y^ n >i bn,i+i5n where {b n , i+ x}n>i € N satisfies 

{Di + x(n) if n is such that b n ^ ^ 

D i+ x(A i+ x) if n = A i+1 and A i+1 < and A i+l / ax,% 
for all other n 

Roughly, (Bi;i > 1) records the ancestral information in the following way. -Bo is a null 
measure, and Bx will contain a single point mass Bx = Dx{Ax)5a 1 recording the coalescence 
time Ax for individuals (0, 1) and (0, 2) in the location of its point mass. Since the last common 
ancestor of (0,1) and (0,2) may also be the last common ancestor of (0,1), (0,2) and some 
other individual (0, j) for j > 2, the multiplicity of this point mass will record the number of 
its future appearances in the coalescent point process. Recursively in every step, say i + 1, this 
record of point masses will need to be updated from Bi to B?, since by moving one individual 
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to the right we are no longer recording the last common ancestor of two previous individuals, 
and the number of future appearances of their last common ancestor by definition goes down 
by 1. In addition, at step i + 1 we also need to record the coalescence time for the last common 
ancestor of (0, i + 1) and (0, i + 2) with the number of its future appearances. This is done by 
taking the updated version B* and adding a new point mass D i+ i(A i+ i)5A i+1 to create B i+ i. 
Because of the monotone embedding of the BGW as a planar tree, it is not possible for the 
level Ai + \ of this new point mass to be greater than any of the common ancestors of (0, j) and 
(0, k) for 1 < j < k < i + 1 unless the multiplicities of these ancestors are depleted at this step 
and no longer appear in the updated B*, so Ai + \ < a* i5 as will be seen in the proof of the next 
Theorem. Moreover, if the last common ancestor of (0, i + 1) and (0, i + 2) is the last common 
ancestor of (0, i + 1) and (0, k) for some 1 < k < i + 1 then the common ancestry of (0, i + 2) 
and (0, i + 1) will be counted in the multiplicity of the mass at Ai + \ in Bi and the updated 
version B* will have non-zero multiplicity at A4+1, so Ai + \ = i5 and there will be no need 
to add a new point mass to create -Bj+i- In addition, if this ancestor is also the last common 
ancestor of (0, i) and (0, i + 1), so Ai + \ = an, the count for this ancestor can not be 1 in Bi, so 
in this case we have Ai + \ = = a\ ^ and there is no need to add a new point mass in Bj+i. 
We now provide the law of this point measure process and its relationship to (Ai;i > 1). 

Theorem 2.2 The i-th branch length is the smallest point mass in Bi 

Ai = s(Bi) = mm{n > 1 : £*({«}) + 0}. 

In addition, the sequence of finite point measures (Bi]i > 0) is a Markov chain started at the 
null measure, such that for any finite point measure b = ^2 n >ib n 5 n , with b n € N U {0}, the 
law of Bi + i conditionally given Bi = b, is given by the following transition. Let a\ := 5(b), 
b* := b — 5 ai , and al := s(b*). Let (A,N) be distributed as (Ai,(' Ai ). Then 



Proof. Instead of the full information (D\(n),n > 1) this sequence starts with a single point 
measure 



and at each step it proceeds by changing the weights of the existing point masses and by adding 
at most one new point mass. 

It is clear from the recursive definition of (Bi\i > 1) that for any i > 1 if b Tl) i ^ then 
b n j = Di(n). We first show that for any i > 1 we have a\ i = Ai and b ai t i = Di(Ai). If we show 
that 7^ 0, then since all other nonzero weights in Bi satisfy 6 n> j = Di(n), the definition of 
Ai will immediately imply that = Ai and b aiu i = Di(Ai). We do this by induction. The 
claim is clearly true for i = 1, so let us assume it is true for an arbitrary i > 1. 

Consider what the transition rule for Di tells us about the relationship between Ai + \ and 
Ai. Recall, for all n > Ai we have Di + \(n) = Di(n), Di + i(Ai) = Di(Ai) — 1, and for all n < Ai 
we have Di + i(n) = C' n from r.v.'s drawn independently of Di. So, 




B 1 = D 1 {A X )8 A 



1 1 



A i+1 <Ai 3n<A, s.t. (4 ^ 

A i+1 = Ai o Di(Ai) > 1 and Cn = 0,Vn<A; 

A i+l >Ai & Di(Ai) = l and C = 0,Vn<Ai 
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In the first case, A+i < At = ai s i < a\ i , so the point mass at A+i will be added to and 
bAi +u i+i 7^ 0. In the second case, b^i+i = D i+1 (Ai) = A(^i) - 1 > 0, and since A+i = A{ 
we have bA i+1 ,i+i ^ 0. In the third case, &Aj,i+i = A+i(A) = A(^i) — 1 = 0, and also 
6 aiji) j = Di(Ai) = 1 implies a*^ = min{n > a^j : 6 n)i / 0} = min{n > Ai : b nyi ^ 0}. Note 
that for all n > Aj for which 6 n j / we also have j-fi = b n ^ 7= 0, and since ^ i+i = 0, the 
smallest value of n with mass Bi + i({n}) > before we potentially add a new mass is precisely at 
s = min{ra > A^ : b n ^ 7^ 0}. In this case a\ { > an = Aj so we have -Dj + i(°i i) = A(&i i) 0; 
so by definition of A+i we must have A+i < a i i- In case A+i < a * % the point mass at A+i 
will be added to A+i and bA i+1 ,i+i 7^ 0. In case A+i = a\ ^ no new mass is added and 
the smallest of the nonzero masses in A+i is at ajj = A+i, as stated earlier, and again 
bA i+1 ,i+i = b a * iii+ i = b a * iti 7^ 0. Hence, we have shown by induction that bA it i 7^ for all 
i > 1, so that a lti = Ai and b aiiti = B(Ai,i). 

Now consider the transition rule for Sj+i, conditionally given A- For the already existing 
mass in Bi the changes in weights are given by the transition rule for Di to be 

A+i(n)l {6n>i ^o} S n = (A(A) - 1) S Ai + ^ A(n)l{6 n ,^o} S n = Bi- 5 at i = B* 

n>l n>Ai 

We have an addition of a new point mass iff A+i < a i i- Since ^ > ai j = A this happens 
iff either 

(i) A+i < A, or 

(ii) b ai iii = 1 and A < A + i < a^-. 

The reason A = A+i is not included in (ii) is that if b aiit i = Di (A) = 1 then A+i(A) = 
Di(Ai) — 1 = 0. Let {(n} n >i be a sequence of r.v.'s drawn independently from Di. Then, by 
the transition rule for Di 

(i) holds iff 3n < A { s.t. (' n / 0, 
then, A+i := min{n < Aj : (' n 7^ 0} 

(ii) holds iff u { = 1, Vn < Aj ^ = 0, and 3n < o* i s.t. C« 7^ 0, 
then, A+i := min{A < n < a\ { : C,' n ^ 0}. 

In case (i) holds it is clear that the weight A+i(A+i) is distributed as conditional on 
A\ < Ai = a\i and that this weight is independent of (Bj\ 1 < j < i). We next argue that this 
is also true in case (ii) holds. In this case Ai < A+i < a \ i and since A+i(A+i) 7^ we must 
also have A(A+i) 7^ because the transition rule for A+i only allows zero entries to become 
nonzero for n < Ai. Moreover _Dj(A+i) = A+i(A+i) because all entries for n > A remain 
unchanged from step i to i + 1. Hence, we must have D n (A+i) = A+i(A+i) for all k < n < i 
where 

k := max{0 < j < i : Aj > A+i} 
with the convention that Aq : = +00. 

We show that As 7^ A+i if A < A+i < &i j- If we had Ac = A+i then &A fe ,fc = Dk(Ak) ^ 
0. The transition rule for D^i implies D^i^A^) = D^A^) — 1. Since A n < A+i = A^ 
for all k + 1 < n < i, iteratively applying the transition rule for D^2i ■ ■ ■ , Di+i implies 
D k+2 (A k ) = D k+l (A k ),...,D i+1 (A k ) = Di(A k ). Thus 

D k (A k ) - 1 = D k+l (Ak) = ■■■ = Di(A k ) = D l+1 (A k ) = A+i(A+i) + 
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and bA k ,k = A(^4fc) > 1. Then, by definition of the weights for Bk+i, . . . , A+i we have that 
bA k ,k+i = Dk+i(Ak) 7^ and the same weight iteratively remains as bA k ,n = D n (Ak) for all 
k + l<n<i + l. However, bA i+1 ,i = &A fe ,i contradicts our assumption that A i+ i < a| i . 

Thus we must have A^ > Ai + \. Since A(^fc) / we also must have A^ > a\ v ^From the 
definition of weights for Bi and Ai < a* i we have 

b a * ; ^ 4> 3k < i s.t. at = Ah> and A n < Aw for all k' < n < i 

since the only point mass in a step n not existing in the previous step must be placed at A n . 
Now, Ak' = a\ i > Ai + \ and the fact that we defined k = max{0 < j < i : Aj > Ai + i} implies 
that k = k', hence A^ = a* i . 

Since A^ > Ai + \ we must have A (4+1) = 0. By the same argument as above of iteratively 
applying the transition rule for A+2 5 • • • , A+i the weights at 4+1 satisfy 

D k+1 (A i+1 ) = ■■■ = A 04+1 ) = D i+1 (A i+1 ) + 

Let us now consider the distribution of A+i(4+i) conditional on the value of k. Since 
Ai + \ < Ak and A+l(4+i) ^ by the transition rule for A+i the value of the weight 
A+i(4+i) is a r - v - distributed as C'a conditional on A\ < A^ and it is drawn independently 
of (Bj-,1 < j < k). So, A+l(^j+i) = A+l(4+i) is distributed as C'a conditional on Ai < 
Ak = a\ i and is independent of (Bj-,1 < j < k). Since A n < 4+1 for all k < n < i, the 
transition rule for A+i)--->A implies that all of the subsequently added point masses in 
-Bfc+i, . . . , Bi are independent of A+iC4+l)> so the value of A+i(4+i) is independent of 
(Bj-,1 < j < i). Finally, integrating over k, we have that A+i(4+i) is distributed as ( Al 
conditional on A\ < a\ i and is independent of (Bj-, 1 < j < i). 

We have now shown that when a new point mass is added at A^i, either 4+1 := min{n < 
Ai ■ Q'n 7^ 0}' or b aii ,% = 1 and 4+1 : = min{A; < n < a[ i : C,' n ^ 0}, where the sequence 
{C' n }n>i are r.v.'s drawn independently from (Bj;j <i). In either case, the weight of the new 
point mass at 4+1 is distributed as (f A conditional on A\ < a* i . 

Let A := min{n > 1 : Q' n ^ 0} and N := Q' A . Since {Cn}n>i are independent of 
(Bj-,1 < j < i), so are A and N, and by Theorem 12. 1\ (A,N) is distributed as (4l>Ca )■ 
Conditional on the value of Bi, we have that (i) holds iff A < Ai, while (ii) holds iff b ai 4 j = 1 
and Ai < A < a\ i- Putting (i) and (ii) together with the definition of a\ ^ we have, condition- 
ally on (Bj\ 1 < j < i), an addition of a new point mass iff A < a* i and A ^ a\^. Moreover, 
the weight of the newly added point mass is distributed as Q' A conditional on A\ < a* i5 
or equivalently it is distributed as N conditional on A < We also showed that given 

[Bj \ 1 < j < i) the point masses existing in Bi change in the next step to produce a reweighted 
point mass measure equal to Bi — 5 ai . Altogether, given (Bj-, 1 < j < i), the next step of the 
sequence depends on Bi only, with the transition rule that if A < a* ^ and A ^ a\ t i in the next 
step we have Bi + \ = Bi — 5 ai .+NSa, and otherwise in the next step we have Bi + \ = Bi — 5 ai ..□ 

In the course of the above proof, we have also shown that if t\ := min{t > 1 : At > A\] 
and N\ := #{1 < j < t± : Aj = A\} then N% = Di{A\), because in order to add the first new 
point mass at a level greater than A\, we must in the course of 1 < j < t\ have exactly enough 
steps at which Aj = A\ that will exhaust all of the weight D\(Ai) of the point mass at A\. 

Analogously for each i > 1, if we let 

U := min{t > i : A t > Ai}, and Ni := #{i < j < U : Aj = Ai} 
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then Di(Ai) = iVj. If furthermore for each i > 1 and k > i we define 

N ik ■= #{k < j < U : Aj = Ai} 

then by a similar argument for k > i we have iV^ = -Dfc(j4j). Note that 2Vjj = iVj. Then, for 
the sequence of finite point measures we have that for all k > 1 

i<fc 

It is easily checked that Bk+i correctly updates the weight of each existing point mass from 
.Bfc, and allows a new point mass to be added only if it is in a location smaller than all mass 
existing in Bk whose weights in Bk+i remain non-zero. We will see this formula again when 
we discuss the coalescent process in the continuous case. 

Remark 1 In the discrete case the coalescence times Ai take on integer values which may 
occur again after they have appeared for the last time ti in a subtree. In the continuous case, 
analogously defined coalescence times have a law that is absolutely continuous w.r.t. Lebesgue 
measure, their values a.s. never occur again after they have appeared for the last time in a 
subtree, hence in the continuous case there is no need to use separation times ti in the definition 
of counters N and Nik ■ 

3 From discrete to continuous: the great-aunt measure 
3.1 Definition of the great-aunt measure 

Now that we have a simple description of the coalescent point process with multiplicities for a 
BGW branching process, our goal is to do the same for the continuous-state branching process. 
In the discrete case we used the process {Di\i > 1) to describe the number of surviving younger 
offshoots of ancestors of an individual as a sequence indexed by generations backwards in time. 
Since in the CSB case the standing population is not discrete, we can not use a process indexed 
by the standing population. Instead, we use a spine decomposition of the lineage of a surviving 
individual, which will record the level (that is, generation in the discrete, and height in the 
continuous case) and the number of all offshoot subtrees in the individual's genealogy. We first 
provide the law of the spine decomposition of the first individual in the standing population in 
the BGW case, and relate it to our previous results. At this point we would like to emphasize 
that results for the spine decomposition of BGW process are not new (see references at the end 
of the Subsection), and that we make use of the spine decomposition here only as a tool that 
will enable us to describe the analogue of the coalescent point process with multiplicities for 
the CSB process later. 

We give some new definitions to describe the spine decomposition of (0, 1), the first indi- 
vidual in generation 0, of a BGW process within its monotone planar embedding. For n > 1, 
we denote by <f> n the set of great-aunts of (0, 1) at generation —n + 1, that is, 

(f> n := {daughters of (— n, tti(n)) excluding (— n + 1, <X\(n — 1))}, 

and by ip n its cardinal, ip n := f^4> n . In other words, ip n is the number of offshoots of (— n, ai(n)) 
not counting the lineage of (0, 1) itself. The set <fi n can be divided into 

a n := {sisters of (— n + 1, a±(n — 1)) with labels (— n + 1, k) s.t. k < Oi(n — 1)} 
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the older offshoots of (— re, oi(n)), and 

p n := {sisters of (— n + 1, oi(n — 1)) with labels (—re + 1, fc) s.t. k > di(re — 1)} 

the younger offshoots of (— re, ai(re)). We call the sequence (p n ',n > 1) the great-aunt measure. 
We let (Z^ n ;0 < k < n — 1) and < A; < re — 1) be the processes counting the descendants 
of those a n and p n individuals, respectively, at successive generations — re + 1, . . . , 0. 

An important observation now is that we do not need the whole infinite embedding of trees 
to make the previous definitions. The descending tree of (— h, a\{h)) is a planar BGW tree 
conditioned to have alive individuals at generation h, with the lineage of individual (0, 1) being 
the leftmost lineage with alive descendants at generation h. Consequently, provided that we 
only consider indices n < h, our definitions make sense for any conditioned BGW planar tree. 
A number of results have already been proved in [13] for spine decomposition of BGW process 
conditioned on survival at a given generation, and we make use of them here. The infinite 
embedding of trees that we introduced extends these results in a way, as we are considering 
an arbitrarily large standing population that may not (in (sub)critical case) descend from the 
same ancestor re generations back in the past. Considering trees whose roots are individuals 
(— n, ai(n)), (— n, Oi(n) + 1),..., it is easy to see that they are independent and identically 
distributed as T(n, 1), the tree descending from (—re, Oi(n)). 

We recall some standard notation commonly used with linearly ordered planar trees. The 
Ulam-Harris-Neveu labeling of a planar tree assumes that each individual (vertex) of the tree 
is labelled by a finite word u of positive integers whose length \u\ is the generation, or height, 
of the individual. A rooted, linearly ordered planar tree T is a subset of the set IA of finite 
words of integers 

W=[Jlf, 

n>0 

where N° is the empty set. More specifically, the root of T is denoted and her offspring are 
labelled 1, 2, 3, . . . from left to right. The recursive rule is that the offspring of any individual 
labelled u are labelled ul,u2,u3, . . . from left to right, where ui is the mere concatenation of 
the word u and the number i. The depth- first search is the lexicographical order associated 
with Ulam-Harris-Neveu labeling (see Fig. [2j where the depth-first search gives the order 
0,1,11,12,2,211,212,3,31,311,32,...). 

Now fix h £ N and assume that generation h is nonempty. Let Xh be the Ulam-Harris- 
Neveu label of the first individual in depth-first search with height h. We denote by (x^ | —re) 
the ancestor of Xh at generation h — n. Then for 1 < re < h, we can define a' n and p' n to be the 
number of daughters of (x^ \ —n) ranked smaller and larger, respectively, than (xh | — n+1), and 
define ip' n := a' n + p' n . We can also let Z an and Z Pn be the processes counting the descendants 
of those a' n and p' n individuals, respectively. From now on, we assume that the tree T has 
the law of a planar BGW tree with offspring distributed as £ and conditioned to have alive 
individuals at generation h. Then it is easily seen that (a' n , p' n , Z an o k n , Z Pn o k n ; 1 < re < h) 
(where k n means killing at time re) has the same distribution as (a n , p n , Z an , Z Pn ; 1 < n < h), 
so from now on we remove primes. The following result provides the joint law of these random 
variables, which was already shown in [13], and the results below are just a restatement of 
Lemma 2.1 from [13] . Recall that p n = ¥(Z n ^ | Zq = 1). 

Proposition 3.1 Conditional on the values of (a n , p n ;n > 1), the processes (Z an , Z Pn ;n > 1) 
are all independent, Z Pn is a copy of Z started at p n and killed at time n, and Z an is a copy of 



14 



1 11 12 2 21 211 212 3 31 311 32 321 322 3221 ? ? " 

Figure 2: a) A Bienayme-Galton- Watson tree conditioned to have alive individuals at gen- 
eration 4. The empty circle is the first such individual (3221) in the lexicographical order. 
Descendants of individuals with greater rank are only indicated by a question mark; b) The 
associated random walk W killed after the visit 04 of the first individual X4 = 3221 with height 
4. Records of the future infimum at 04 are shown by dotted lines. The record times are exactly 
the visits to ancestors 322, 32, 3 and of 3221. The pairs of overshoots and undershoots of W 
across the future infimum at those times (in this order) are (a\,pi) = (0,2), (02, P2) = (1, 1), 
("3, Ps) = (1, 2) and (a 4 , Pa) = (2, 1). 

Z started at a n , conditioned on being zero at time n — 1. In addition, the pairs (a n ,p n ;n > 1) 
are independent and distributed as follows 

P(a n = j, ip n = k) = P(£ = k + 1)5=1(1 - Pn - X y k > j > 0. 

Pn 

Remark 2 Recall that D±(n) is the number of sisters of (— n + l,a\(n — 1)) with alive de- 
scendants in the standing population. Thus, an immediate corollary of Proposition 13.11 is that 
the random variables (D\(n);n > 1) are independent and that, conditionally on p n , D\(n) 
is a binomial r.v. with parameters p n and p n -\. Since, according to Theorem \2.1l D\(n) is 
distributed as Q' n , we also have 

E(V") =E((l-p n _!+p n _ lS )^), 

as one can easily check. 
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Note that the last statement in Proposition 13.11 can be viewed as an inhomogeneous spine 
decomposition of the 'ascendance ' of the surviving particles in conditioned BGW trees. More 
standard spine decompositions are well-known for the 'descendance ' of conditioned branching 
trees, see e.g. [9J [191 Ell EH 021 El US] (the idea of spine decompositions originating from [16]). 
To bridge the gap between both aspects, notice that in the (sub)critical case, IP(</? n + 1 = k) = 
P(£ = k)(l — (1 —p n -i) k )/p n converges to &P(£ = fc)/E(£) as n — > oo, which is the size-biased 
distribution of £. This distribution is known to be the law of offspring sizes on the spine when 
conditioning the tree on infinite survival. 



3.2 A random walk representation 

We next show how to recover any truncation (p n ;n < h) of the great-aunt measure from the 
paths of a conditioned, killed random walk. This will be particularly useful when we define 
the analogue of the great-aunt measure for continuous-state branching processes in the next 
subsection. Our result makes use of a well-known correspondence between a BGW tree and a 
downward-skip-free random walk introduced and studied in [31 [26] . 

Let us go back to the planar tree T. We denote by v n the word labeling the ra-th individual 
of the tree in the depth-first search. For any integers % > j and any finite word u, we say that 
ui is 'younger' than uj. Also we will write u -< v if u is an ancestor of v, that is, there is a 
sequence w such that v = uw (in particular u -< u). Lastly, for any individual u of the tree, 
we let r(u) denote the number of younger sisters of u, and for any integer n, we let W n := if 
v n = 0, and if v n is any individual of the tree different from the root, we let 

W n := r(u). 

The height H n , or generation, of the individual visited at time n can be recovered from W as 

H n := K| = #{1 <k<n:Wk= min Wj} (2) 

k<j<n 

See Fig. [2] where (W n ;n = 1, . . . , 15) is given until visit of vis = 3221 whose height is Hi§ = 4. 

In the case when T is a BGW tree with offspring distributed as £, then it is known [H [26] 
that the process (W n ; n > 1) is a random walk started at 0, killed upon hitting —1, with steps 
in {—1,0, 1,2,...} distributed as £ — 1. 

Fix lieN and set 

ah '■= min{n > 1 : H n = h}. 

Writing Tj for the first hitting time of j, in particular T_i for the first hitting time of —1, we 
get 

{ max Hj <h} = {Z h = 0} = {a h > T_i}. (3) 

0<?'<T_i 

Now, when ah < T— l, we let I ah denote the future infimum process of the random walk 
killed at ah 

J<r<cr h 



and we let th = 0, . . . , to = ah denote the successive record times of I ah 

t k := max{j < £ fc _! : Wj = Hj 1 < k < h. 
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Also observe that by definition of ah, we must have t\ = tQ — 1. Lastly, we use the notation 
AjW := Wj+i — Wj. A straightforward consequence of [26] is that when generation h is 
nonempty, t k is the visit time of (x^ \ —k), i.e. the unique integer n such that v n = (x^ \ —k) 
(where Xh is the first individual in depth-first search with height h). Furthermore, 

<p k = A tk W = W tk+1 -W tk , 

and 

«fc = W t .+i - min Wj and p k = min Wj - W t ,. 

t k +i<j<t t k +i<j<t 

In particular we can check a\ = 0, and ipk = a k + Pk- Now note that, by definition of t^, ■ ■ ■ , tQ, 
we have = Wt k and for all j : t k < j < ifc-i we have l„ h = Wt k _ 1 , so 

a k = W tk+ i - /*f 1 = W tk+1 -W tk _ 1 and p k = I^ 1 - W tk =W tk _ 1 - W tk . 

An illustration of these claims can be seen in Fig. [2j 
Last, observe that 

h h h 1 <T fe — 1 

!>/(*) = ^(/> h +1 -^j/(fc) = E E h^Mt'-w^Kh-H,) = e Ajiijih-Hj), 

k=l k=l k=l j=0 j=0 

since, if j = tk for some k, then 

otherwise, tk < j < for some fc, and 

AiP = P +1 - P =Wt -W t =0 

This is recorded in the following statement, which we will use later as a distributional equality 
for a random walk W conditioned on maxi<j<7_j Hj > h. 

> W + with bounded support, h > Supp(f), i/maxi<j<y_ 1 Hj > 
E E Ajlkfih-Hj) 

k>l j=0 



3.3 A continuous version of the great-aunt measure 

In Section 2, we gave a consistent way of embedding trees with an arbitrary size of the standing 
population, each descending from an arbitrarily old founding ancestor, so that the descending 
subtree of each vertex is a BGW tree. The natural analog of this presentation is the flow of 
subordinators introduced by Bertoin and Le Gall in Because the Poissonian construction of 
this flow displayed in [5l Section 2] only holds for critical CSB processes and without Gaussian 
component, and because it is rather awkward to use in order to handle the questions we 
address here, we will now define an analogue of the great-aunt measure, using the genealogy 
of continuous-state branching processes introduced in [26] and further investigated in [Hj. 



Proposition 3.2 For any f : N — 

h, then 

< P,f > ■= 
where Ajli h = I^ 1 - H h . 
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We start with a Levy process X with no negative jumps and Laplace exponent ip started 
at x > 0, such that ip'(Q+) > 0, so that X hits a.s.. As specified in [22l [23], the path of X 
killed upon reaching can be seen as the (jumping) contour process of a continuous tree whose 
ancestor is the interval (0,x). For example, the excursions of X above its past infimum I are 
the contour processes of the offspring subtrees of the ancestor. Almost surely for all t, it is 
possible to define the height (i.e., generation) Ht of the point of the tree visited at time t, as 

1 f* 

H t := km — / 1 {X s<I ? +E \ ds < oo, (4) 

where (e^) is some specified vanishing positive sequence, and 

If := inf X r . 

s<r<t 

With this definition, one can recover the population size at generation a as the density Z a of 
the occupation measure of H at a, that is, the (total) local time of H at level a. It is proved 
in [261 E] that this local time exists a.s. for all a and that (Z a ;a > 0) is a continuous-state 
branching process with branching mechanism ip. 

^From now on, we will deal with a general branching mechanism ip characterized by its 
Levy-Khinchin representation 

i>{\) = a\ + p\ 2 + ( A(dr) (e~ Xr - 1 + Arl {r<1} ), 

J(0,oo) 

where f3 > is the Gaussian component and A is a positive measure on (0, oo) such that 
/(0oo)(l ^ r 2 )A(dr) < oo, called the Levy measure. We will denote by A a Levy process 
with Laplace exponent ip (started at unless otherwise stated), and by Z a continuous-state 
branching process, or CSB process, with branching mechanism ijj. 
We (only) make the following two assumptions. First, 

sup{A : -0(A) < 0} =: rj < oo, 

so that X is not a subordinator (i.e., it is not a.s. nondecreasing). Second, 

f du 

so that Z either is absorbed at in finite time or goes to oo, and H has a.s. continuous sample 
paths [26]. This also forces J, 01 ->rA(dr) to be infinite, so that the paths of X have infinite 
variation. We further set 

^( A ) := / 77T A > 1> 

J[X,oo) WW 

and v the inverse of eft on (rj, +oo) 

v(x) := (f>~ 1 (x) x € (0, oo). 

Notice that v is nonincreasing and has limit r\ at +oo. It is well-known (e.g. [22]) that if Z is 
started at x, then it is absorbed at before time t with probability e~ xv ^ . Also, if N denotes 
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the excursion measure of X — I away from under P (normalized so that —I is the local time), 
then HI Corollary 1.4.2] 

N(supH > a) = v(a) a > 0. 

Here, we want to allow the heights of the tree to take negative values. To do this, we 
start with a measure which embodies the mass distribution broken down on heights, of the 
population whose descendances have not yet been visited, in the same vein as in [261 EL but 
with negative heights. In the usual setting, the mass distribution pt of the population whose 
descendances have not yet been visited by the contour process X by time t, is defined by 

<Pt,f>= I d s I?f{H s ), 
J[o,t] 

for any non- negative function /, where on the right-hand side we mean integrating the function 
s i — y f(H s ) with respect to the Stieltjes measure associated with the function s i— > If. Then 
pt([a,b]) is the mass of the tree between heights a and b whose descendants have not yet been 
visited by the contour process. 

Here, we will start with a random positive measure p° on [0, +oo), with the interpretation 
that p°([a,b\) is the mass of the tree between heights —b and —a whose descendants have not 
yet been visited by the contour process. This measure is the exact analogue of the great-aunt 
measure of the previous subsection. 

Definition 3.3 For every t > 0, set TT^(dr) := p(t,r)dr, where 

p(t,r) := e v ® r [ e~ v ^ z K{dz) r > 0. 

J (r,oo) 

We define p° in law by 

p° (dx) := P dx + A t 5 t (dx), 

t:A t >0 

where 5 denotes a Dirac measure, and (t, At) is a Poisson point measure with intensity measure 
dtit^{dr). 

3.4 Convergence of the great-aunt measure 

We can now prove a theorem yielding two justifications for the definition of the measure p°. 
First, we show that p°(dx) as defined above is indeed the mass, at height h — x, of the part 
of the tree whose descendants have not yet been visited, either by a long-lived contour process 
(h — > oo) or under the measure N(- \ sup H > h). Second, we show the convergence of the 
appropriately rescaled discrete great-aunt measures to the measure p°, as the BGW processes 
approach the CSB process Z with branching mechanism ip. In the next Section we will show 
how p° allows us to define the coalescent point process with multiplicities for the CSB process, 
and help us establish convergence from the appropriately rescaled point process (Bi]i > 1). 

We assume there exists a sequence (7 p ;p > 1), 7 P — s> oo as p — ^ oo, and a sequence of 
random variables (£, p ;p > 1), such that, if denotes the random walk with steps distributed 
as £ p — 1, the random variables {p~ 1 W^ j) converge in law to X±, where X denotes the Levy 
process with Laplace exponent tp. Then it is known ( \15\ Theorems 3.1 and 3.4]) that if 



19 



denotes the BGW process started at \px] with offspring size distributed as £ p , then the rescaled 
processes {p~ l t yt > 0) converge weakly in law in Skorokhod space to the CSB process Z 
with branching mechanism tp, started at x. 

In the following statement, we denote by pw) the great-aunt measure associated to the 

offspring size £ p . We have to make the following additional assumptions: if denotes the 

(p) 

p.g.f. of £ p and /A its n-th iterate, then for each 5 > 

liminf / r jf ) 1 (0) p > 0. 

Convergence results in (iii) below rely heavily on results already established by Duquesne 
and Le Gall in [8] on convergence of appropriately rescaled random walks and their 

height processes ffW to the Levy process X and its height process H. In particular, technical 
conditions such as the one above are justified in [HJ Section 2.3]. 

Theorem 3.4 Let denote the first time that H hits h > 0. 

(i) For any h > and for any non-negative Borel function f vanishing outside [0,h], the 
random variable Jj a ^d s I^ h f(h — H s ), under N(- \ supH > h), has the same law as 

<P°,f>; 

(ii) For any non-negative Borel function f with compact support, as h — > co the random 
variables Jj Q a ^d s I^ h f{h — H s ) underF(- \ aj t < oo) converge in distribution to < p°,f >; 

(iii) For any non-negative Borel function f with compact support and sequence of non-negative 
continuous functions {f p } such that f P ^ p {x p ) := fp(lp l Xp) — > f(x) whenever ^p l x p — > x, 



the random variables p 1 < p^ p \ f P) -y„ > converge in distribution to < p°,f > as p 



CO. 



Proof. Let us prove (i). It is known |8j that a.s. for all t the inverse of the local time on [0, t] 
of the set of increase times of (J*; s 6 [0, t]) has drift coefficient ft, so that 

/ d s I 3 t f(H s ) = 13 f(x)dx + V f(H s )A s Il 

J\0M JO ~7Zn 



se[o,t] 

where the sum is taken over all times s at which (J|; s E [0, £]) has a jump, whose size is then 
denoted A s / t s . This set of times will be denoted Jh when t = o~h- As a consequence, it only 
remains to prove that the random point measure Mh on (0, h) x (0, oo) defined by 

M h := J2 S ( H s,&sI S a h ), 

S&Jh 

where the sum is zero when is empty {ph = oo), is an inhomogeneous Poisson point measure 
with the correct intensity measure. More precisely, we are going to check that for any non- 
negative two-variable Borel function / 



AM / M h (dtdr)f{h-t,r) 
W(0,/i)x(0,oo) 



\ rh poo 

supiT > hj = J dt J drp(t,r)f(t,r) 
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that is, 

r ph f'OG 

N M h (dtdr)f{h-t,r) = v(h) dt drp(t,r)f(t,r). 

J(0,/i)x(0,oo) J Jo 

First notice that 

N I M h {dtdr)f(h-t,r) = N l s<tT J(h- H S ,AX S + K' s )l AXs> _ K , 

J(0,h)x(0,oo) s:AX 3 >0 

where K' s is the global infimum of the shifted path X' 

X' u := X s+U -X s 0<u<a h -s. 

But on {s < ah}, Oh — s is also cr' h _ Ha (with obvious notation), so by predictable projection 
and by the compensation formula applied to the Poisson point process of jumps, we get 

N [ M h (dt dr) f (h — t,r) = N I ' * ds [ A(dz)G f (z,H s ), 

J(0,h)x(0,oo) Jo J(0,oo) 

where (with the notation / for the current infimum) 

G f (z,t)=E [f(h-t,z + I ah _ t )l-i ah _ t<z 

Now —I ah _ t is the local time at the first excursion of X — I with height larger than h — t so it 
is exponentially distributed with parameter N(supH > h — t) = v(h — t). As a consequence, 



G f (z,t) =v(h-t) / dr/(/ l -t,r)e- (z - r)l,(h -* ) , 



which yields 



with 



N 



[ M h (dtdr)f(h -t,r) = N [ * dsF f (h - H s ) 

J(0,h)x{0,oo) Jo 

F f (t) = [ A(dz)G f (z,h-t) 

= v(t) [ A(dz) drf(t,r)e-( z ~ r ^ 

J(0,oo) Jo 

= v(t) / drf(t,r)e rv{t) / A{dz)e~ zv{t) 

Jo J(r,oo) 
roc 

= v{t) I drp(t,r)f(t,r). 



Jo 

So we only have to verify that for any non-negative Borel function g 

rh 



N dsg{h-H s )= [ dt^g(t). 
Jo Jo v(t) 
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Due to results in [26], there indeed is a jointly measurable process (Z(a,t);a,t > 0) such that 
a.s. for all a 



pa roc 

/ dsg(H s ) = / dtZ(a,t)g{t). 
Jo Jo 



In particular, 

fVh fh 
N dsg(h-H s )= dtN(Z(a h ,t))g(h-t), 
Jo Jo 

so we just need to check that N(Z(ah,t)) = v(h)/v(h — t), that is, 

1 



N(Z(a h ,t) | supH > h) 



v(h - 1)' 



But, conditional on supH > h, at < oo, and Z(ah,t) is the local time of H at level t between 
at and a^, which is exponential with parameter iV(sup-ff > h — t) = v(h — t), hence the claimed 
expectation. 

We proceed with (ii), which readily follows from (i). Indeed, for any h larger than sup{x : 
/(*) 0} 

/ d s i*j(h-H s )= [ d s rjj(h-H' s ), 

where primes indicate that the future infimum and the height process are taken w.r.t. the 
process X' with law N(- \ supH > h), defined as 

X 't = X Ph+t ~h h <t <T h - p h , 
where ph is the unique time s < ah when X s = l a% and Th is the first time t > ah when 

X t = I ah ■ 

We end the proof with (hi) . Thanks to Proposition 13.21 we know that < p^ , f Pt y p > has 
the same law as 

lph> 



£ Al^%^ p h-H^ 



3=0 

conditional on max . ( p ) > jph, where is the height process associated with 

Tzl is the first hitting time of —1 by W^ p \ a v ^ h denotes the first hitting time of j p h by H^ p \ 
and JW denotes the future infimum process of stopped at time a^ h . If we can prove 

convergence of (p^W, . v 1 ,7~ 1 iT r . v i) under this conditional law to (X. Aa , , H.A a , ) 
& ^ \pip-^ ph V 'p \pi P -^ ph Y v h hJ 

under the measure N(- | supiJ > h), then by the generalized continuous mapping theorem 
(e.g. Theorem 4.27 of |17j ) we will get the convergence of 

U p V 



which has the law of p 1 < p( p \ f p ^ p >, to Jj Q , d s I^ h f(h — H s ) under N(- | sup H > h) which 
has the law of < p°, / > thanks to (i). 
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For (sub)critical £ p the convergence of this pair in distribution on the Skorokhod space of 
cadlag real-valued paths is a direct consequence of the results Corollary 2.5.1 and Proposition 
2.5.2 already shown in [FJ Section 2.5]. To verify that the same holds for supercritical £ p as well, 
we note that the assumption of (sub)criticality (hypothesis (H2) in the notation of [8j) is not 
crucial in any of the steps of their proof, since the obtained convergence relies essentially only on 
the assumption that P~ l W^ i converges to X (see the comment in the proof of Theorem 2.2.1 
in [8J that any use of subcriticality in their proof can be replaced by using weak convergence of 
random walks). Of course in the supercritical case the height process H may drift off to infinity 
corresponding to the event that the CSB process Z survives forever, in which case it will code 
only an incomplete part of the genealogy of the first lineage which survives forever. However, 
for our purposes we only need to consider the genealogy of the first lineage that survives until 
time h, so this will not be an impediment for our considerations. 

For supercritical £ p we still have the convergence of the pair 

in distribution on the Skorokhod space of cadlag real-valued paths as in Theorem 2.3.1 and 
the first part of Corollary 2.5.1 in [8j Section 2.5]. We now follow the same reasoning as in 
Proposition 2.5.2 of [8j. Let GP h = sup{s < a 9 , h : H^ p) = 0}. If we think of H&> as the height 
process for a sequence of independent BGW trees with offspring distribution £ p then G? h is the 
initial point of the first BGW tree in this sequence which reaches a height j p h. Let (W^ p \ H^) 
denote the process obtained by conditioning {W^ p \ H^) on max „( P ) > 7„/t. Then, 

Let Gh = sup{s < ah : H s = 0}, and let (X,H) denote the process obtained by conditioning 
(X, H) on sup H > h, then 

(X. Aah ,H. Aah ) = (X( Gh+ .) A(Th , H( Gh +-)Aa h )- 

If we use the Skorokhod representation theorem to assume that the convergence © holds a.s., 
the same arguments as in proof Proposition 2.5.2 of [8J imply that a p h converges a.s. to ah and 

G p , converges a.s. to G p h , and hence (p~ 1 W^ ,„ P . ■.. P ,,7~ 1 -ff r ( ' p ^ ,„ P , p n ) converges 

a.s. to (X( Gh+ . )Aah ,H iGh+ . )Aah ), proving that 

yF \P7pV 'P \pip-V p^oo K ' 

in distribution on the Skorokhod space of cadlag real- valued paths. From this the desired 
convergence in distribution of < f P)lp > conditional on m&x 1< _. <T ( p ) > j p h to 

< p°, / > under the measure N(- \ sup H > h) follows. □ 
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4 The coalescent point process in the continuous case 



4.1 Definition of the genealogy 

We now define the analogue of a coalescent point process with multiplicities for the genealogy 
(other than the immediately recent) of an arbitrarily large standing population of a CSB process 
Z. We do so by first constructing the height process, H* , for a planar embedding of CSB trees 
of arbitrary size descending from an arbitrarily old ancestor using the continuous version of the 
great-aunt measure, p°. From H* we define coalescence times of two masses from the standing 
population in the usual way, that is, from the maximal depths of the trajectory of H*. 

We construct the height H* of the individual visited at time t from: the height Ht defined 
in representing the height at which that individual occurs in an excursion of X above its 
infimum; plus the height at which this excursion branches off the unexplored part of the tree. 
More specifically, let 

Y2:=p°([0,x]) *>0, 

so that 

t<x 

where (t, A t ) is a Poisson point measure with intensity given by dtir^(dr) from Definition 13.31 
Next, let L° denote the right-inverse of Y° 

L°(t) : = inf{a : Y" a ° > t} t> 0. 

Then define 

Ht := H t - L°(-I t ), 

where we remember that It = info<<s<t A" s . 

This gives a spine decomposition of the genealogy of the continuous-state branching process 
associated with H* in the following sense. For the individual visited by the traversal process at 
time t, the level (measured back into the past, with the present having level 0) on the infinite 
spine at which the subtree containing this individual branches off is —L (—It), and the relative 
height of this individual within this subtree is Ht- 

As in the discrete case, we want to display the law of the coalescence time between successive 
individuals at generation 0. In this setting, this corresponds to the maximum depth below of 
the height process, between successive visits of 0. Actually, any point at height is a point of 
accumulation of other points at height 0, so we discretize the population at height as follows. 
We consider all points at height such that the height process between any two of them goes 
below — e, for some fixed e > 0. Namely, we set To := 0, and for any % > 1, 

Si := inf{i > Ti_i : fl? = -e} and T := inf{t > S t : fl? = 0}. 

Then the coalescence times A^A^,-. of the e-discretized population are defined as 

A\ :=-mi{H* t :T i _ x <t<T i }. (6) 

As in the discrete case, one of the main difficulties lies in the fact that the same value of A\ 
can be repeated several times. So we define 

Nf := #{j >i:M = Af}. (7) 
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4.2 The supercritical case 

Actually, the previous definition of genealogy only holds for the subcritical and critical cases, 
and a modification needs to be made in the supercritical case due to possible appearances of 
branches with infinite survival times into the future. What we need to do in the supercritical 
case, is construct a height process H* that corresponds to a CSB tree whose infinite branches 
have been truncated, much as in the last chapter of [25]. One would be naturally led to 
consider the height process of a tree truncated at some finite height. However, the distribution 
of such an object is far more complicated than to truncate (actually, reflect) the associated 
Levy process at some finite level. On the other hand, the genealogy coded by a Levy process 
reflected below some fixed level is not easy to read, since it will have incomplete subtrees at 
different heights. To comply with this difficulty, we will construct a consistent family of Levy 
processes X K reflected below level k, so that if k' > k, X k can be obtained from X K by 
excising the subpaths above k. The genealogy coded by the projective limit of this family is 
the supercritical Levy tree, as is shown in [22j in the case with finite variation. We first give 
the details of this construction in the discrete case and then sketch its definition in the CSB 
case. Let T be a planar, discrete, possibly infinite, tree, in the sense that it can have infinite 
height but all finite breadths. Then T still admits a Ulam-Harris-Neveu labeling and, as in 
Subsection 13.21 we can define W(0) = and 

W(v) := VV(it) 

where r(u) denotes the number of younger sisters of u. Now for any k£N, let T K denote the 
graph obtained from T by deleting all vertices v such that W(v) > k. It is then easy to see 
that T K is still a tree, and that if denotes the n-th vertex of T K in the lexicographical order 
and Wfi := W(v*), then for any k' > k, the path of W K can be obtained from the path of W K 
by excising the subpaths above k. In other words, 

W K = C K (W K '), 

where for any path e, the functional C K erasing subpaths above k can be defined as follows. 
Let A K denote the additive functional 

n 

and a K its right inverse 

aj5(e) := min{n : A*(e) > k} k G N. 

Then C K (e) := eoa K (e). 

Further, it can be shown that W K has the law of the random walk with steps distributed as 
£ — 1, reflected below k and killed upon hitting —1. Since these two properties (consistency by 
truncation and marginal distribution) characterize the family of reflected, killed random walks 
(W k ;k € N), its continuous analogue has the following natural definition. 
For any k > let X K denote a Levy process without negative jumps and Laplace exponent 
ijj reflected below k and killed upon reaching 0. The reflection can be properly defined as 
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follows. Start with the path of a Levy process X (with the same Laplace exponent), set St '■= 
sup 0<s<t X s , and define X* := X t if at time t, X has not yet hit (k, +oo), and Xf := K+X t — St 
otherwise (and kill X K when it hits 0). The same definition could be done by concatenating 
excursions of X away from (ft, +00) thanks to Ito's synthesis theorem. 

Now in the continuous setting, a similar definition of C K can be done by adapting the additive 
functional A K into 

A K t {e) := [ dsl {es < K} t>0. 
J 

It is easily seen that for any k' > k, 

X K = C K (X K '). 



Kolmogorov's extension theorem then ensures the existence of a family of processes (X K ; k > 0) 
defined on the same probability space, satisfying pathwise the previous equality, and with the 
right marginal distributions (reflected, killed Levy processes). 

All results stated in the remainder of the paper hold in the supercritical case if we replace 
the Levy process X by the projective limit of X K as k — > 00. More simply, we can construct 
the same consistent family (e K ;K > 0) for the excursion of X — I away from 0. Then it is 
sufficient to replace each excursion of X — I drifting to 00 (there are finitely many of them on 
any compact interval of local time) by a copy of e K for some sufficiently large k. More precisely, 
for the excursion corresponding to an infimum equal to — x (x is the index of the excursion in 
the local time scale), we need that the modified excursion e K be such that all heights below 
h := L°(x) be visited. In other words, one has to choose k large enough so that for any k' > k, 
the occupation measure of the height process of e K restricted to [0, h] remains equal to that of 
e K . 

In order not to overload the reader with technicalities that are away from the core question 
of this paper, we chose not to develop this point further, and we leave it to the reader to modify 
the proofs of the next subsection in the obvious way at points where the supercritical case has 
to be distinguished. 



4.3 Law of the coalescent point process 

Now that we have the height process for the infinite CSB tree with an arbitrarily old genealogy 
and the ancestral coalescence times with multiplicities describing genealogy of its standing 
population, we proceed to describe their law. Furthermore, as was our main goal, we define an 
analogue of the coalescent point process with multiplicities (Bf, i > 0) for the CSB tree. 

Our results show similarities with the results of Duquesne and Le Gall in [HJ Section 2.7] for 
the law of the reduced tree under the measure N(- \ supH > T). However, our presentation 
is quite different for a number of reasons. First, we do not characterize the branching times in 
terms of the Markov kernel of the underlying CSB process, but rather treat these times as a 
sequence. Second, the branching tree is allowed to be supercritical. Third, the tree may have 
an infinite past. 

Recall the definition of coalescence times of individuals in the discretized standing popula- 
tion (Af;i > 1) and their multiplicites (Nf;i > 1) from ((6]) and ([7]). 
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Theorem 4.1 The joint law of (Af, Nf) is given by 

P(Al >x) = #M x > e, (8) 

where ip(u) := ip{u)/u for any u > n. In addition, for any n>2 and x > e 

P(Af G dx, Nf = n)/dx = P(Al > x)v{x) n / A(dz)e~ v ^ z - -, 

i(o,oo) (n + 1)! 

whereas for any x > e, 

v2\ 



P(Af G dx, Nf = l)/dx = ¥(A\ > x)v(x) ]B + j ^ 



-v(x)z £_ 
(0,oo) 2 



A(dz)e 



Remark 3 The formula giving the joint law of (A\, Nf) can also be expressed (see the calcu- 
lations in the proof) as follows 



P(Af G dx, Nf = n)/dx = P(A? > x) I /31 {n=1} + f ^ x \dr) 

\ J{0,+oo) 



c - rv(x) (rv(x)) 7 
nl 



showing that when 13 = 0, Nf actually follows a mixed Poisson distribution. 

Remark 4 Similarly as in the discrete case, observe that for subcritical trees (rj = and 
ip'(0+) > 0), coalescence times can take the value +oo, corresponding to the delimitation of 
quasi- stationary subpopulations with different (infinite) ancestor lineages. Indeed, from the 
previous statement, 

nAl _ +0O) _ mi. 

In addition, the event {A\ = +00} is the event that the whole (quasi- stationary) population 
has coalesced by time e. In other words, ifV denotes the coalescence time of a quasi- stationary 
population, also referred to as the time to most recent common ancestor, then 

F(V >x)= ~ y > . 

In the discrete case, the study of V will be done in Section \ 5.SX in a slightly more detailed 
statement Proposition \5.°A 



Proof of Theorem 14.11 First notice that S\ is also the first hitting time of —Y® by X. 
Denote by (s,e s ) the excursion process of X — I away from 0, where — / serves as local time. 
Then the event {A\ > x} is the event that the excursions (e s ; < s < Y®) all have sup H(e s ) — 
lP s < 0. As a consequence, 

P(Af > x) = Eexp- ^ X{svL P H(e s )<L°}, 
Y°<s<Y° 
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where X{A} = on A and X{A] = +°° on so that using the exponential formula for the 
excursion point process, 

P(Af > x) = Eexp- / dsN(supH > L° s ) =Eexp- I dsv(L° s ). 

JlY°,Y°] J[Y?,Y2\ 

Now since L° is the right-inverse of Y° , and because of the definition of Y (Definition I3.3[) in 
terms of the Lebesgue measure and the Poisson point process (u,A u ), we get 

/ dsv(L° s )=/3 duv(u)+ V v(u)A u , 

J\Y°,Y°] J\e,x] 



e<u<x 



so that 



¥(A\ > x) = exp - ( p [ duv(u) + [ du [ n {u) (dr) (l - e -™ W ) ) . (9) 

\ J[e,x] J[e,x] 7(0,00) V ' J 

We compute the second term inside the exponential thanks to the Fubini-Tonelli theorem 
/ TT {u \dr) (l - e -™H) = f dre v{u)r [ e~ v ^ z A(dz) (l - e~ rv ^) 

7(0,00) ^ ' 7[0,oo) 7(r,oo) ^ ' 



A(cfe) / (e v{u)r - l) 

(0,oo) 7(0,«) ^ ' 



1 



*>(«) 7(0,oo) 

Finally, we get 

P(Af > x) = exp - / duFo V (u), 

J[e,x] 

where 

F(A) := /3A + - / A(dz) (l - e~ Xz - Xze~ Xz ) A > r/. 
A ,/(o,oo) ^ ' 

But elementary calculus shows that 

F(A) = ^(A)-^(A) \>r ) , 
so that by the change of variable y = v(u), u = <p(y), we get 



F(Al>x) = exp - / Th F (y) 

J\v(x)Me)} w{y) 



(x),v(e)} ^(V) 



exp - / dy 

J[v(x),v(e)] V v{y) y 

ip(v(x)) 



which shows the first part of the theorem. 
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As for the second part, the event {A\ £ dx,Nf = n} is the event that the excursions 
(e s ; Y® < s < Y x _ dx ) all have sup H{e s )—L° s < 0, and n is the number of excursions (e s ; Y x _ dx < 
s < Y£) (for which L° s G dx) such that supH(e s ) > x. Next notice that Y£ - Y®_ dx = 
f3dx + (Y x — Y x _)l{ E Q( dx -jj, where E°(dx) is the event that Y° has a jump in the interval 
(x — dx,x). As a consequence, by the compensation formula applied to the Poisson point 
process of jumps of Y°, 

P(Af e dx, Y® - y x °_ e dr, Nf = n) = P(Af > x)dx tt^ (dr) e~ rv ^ 



iv. 



since N(supH > x) = v(x). Also 

P(Af G dx, y x ° = Y£_,Nf = n) = 0, 

if n > 2, whereas 

P(Af G (fx, Y x ° = K^L, JVf = 1) = P(Af > x)pdxv{x). 
As a consequence, for any n > 2, 

P(Af €dx,M = n) = dx¥(Al > x)^f- f ^ x \dr) e~ rv ^ r n 

n - J[0,oo) 

= dx¥{A\> x)'^^ I drr n [ A(dz) e~ zv{x) 
n - J[0,oo) J(r,oo) 

= dxF(A\ > x)v{x) n / A(dz)e- W ^* 7 - rr, 

V(0,oo) (n+1)! 



which is the desired result. The last result can be obtained by the same calculation. Summing 
over n yields 



P(AfGdx) = dxP(Af >x) \l3v(x) + -!— [ A(dz)(l-e- v( - x > -v(x)ze- v( - x ^)) 

\ v{x) y (0)Oo) v j j 

= dxF(A{ > x)Fov(x), 
as expected, since P(Af > x) = exp — Jj £ , duF o u(u). □ 

Finally, we define the coalescent point process with multiplicities for the e-discretized stand- 
ing population. At the end of Section \2. 2 1 we gave an alternative characterization of (B^; k > 0) 
for the BGW tree as a sum of point masses, see ([T]), whose multiplicities record the number 
of their future appearances as coalescent times, until the first future appearance of a larger 
coalescence time. Since, in the CSB tree we do not have a process analogous to (Di; i > 1), we 
actually define the e-discretized coalescent point process with multiplicities from this angle. 

For each i > 1 and k > i we define Nf k as the residual multiplicity of Af at time k 

N? k := #{j > k : A) = Af }, 
so Nfi = Nf. Next define for each k > 1 the random finite point measure Bf. on (e, +oo] as 



B k '■= N ik$Af If a? <A £ , 
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where 5 denotes a Dirac measure, and by convention let Bq be the null measure. 

Recall that s is a mapping from a point measure on R to the minimum of its support. 
The following result provides the law for the coalescent point process with multiplicities of the 
e-discretized population in the CSB tree. 

Theorem 4.2 The sequence of finite point measures (Bf;i > 0) is a Markov chain started at 
the null measure. For any finite point measure b = Y2j>i n i^«j > w ith nj € N and aj € R + , 
the law of B k+1 conditionally given B £ k = b, is given by the following transition. Let a\ := 
s(b),b* := b — 5 ai ,a* := s(b*). Let (A,N) be a r.v. with values in (e, +oo] x N distributed as 
(Al,Nf). Then 

B e f b* + NS A if A < a*, 

k+1 ' 1 b* otherwise 

In addition, as in the discrete case, the coalescent point process can be deduced from B £ as 



Remark 5 When X is a diffusion, the measure p° has no atoms, so there are no repeats of 
coalescence times (N = 1 a.s.). In this case, at each step of the chain there is only one nonzero 
mass, that is, \/k > 1 B k = 6a% ■ The previous statement shows that the sequence starts with 
Bf = o~a\ where A\ is distributed as A, and in every transition the single point mass from the 
previous step is erased and a new point mass 5a\ + 1 is added with independently chosen A £ k+1 
distributed as A. So, the random variables (A £ )i>i are i.i.d. 

On the other hand, when X is a diffusion (and only in that case, see JSi), the height process H 
is a Markov process. The coalescence times (A £ ;i > 1) are just the depths of excursions of H 
(with depth greater than e) below some fixed level. This again implies that the (^4f)j>i are i.i.d. 
Moreover, by taking e — > in equation ([8]), one can compute the intensity measure // of the 
Poisson point process of excursion depths (its tail is given by Jl{x) = ip(v(x))). In particular 
in the Brownian case, it is known that the height process is (reflected) Brownian, so that n{dx) 
is proportional to x~ 2 dx (see £0 [28^). 

For the BGW tree, the discrete analogue of the height process H is, again, in general not a 
Markov process, except in special cases of the offspring distribution. Namely, the only exceptions 
are when £ is linear-fractional (see Section \5.1\ for definition). In these cases we will also observe 
that the coalescence times (Ai;i > 1) are i.i.d. (see Proposition \5. 

Remark 6 Even in general, i.e., in the presence of multiplicities, we know that there exists a 
coalescent point process whose truncation at levels is the sequence (A £ ;i > 1). It is the process 
of excursion depths of the height process in the local time scale. However, the question of char- 
acterizing the distribution of this coalescent point process (without truncation) remains open. A 
natural idea would be to use a Poisson point process with intensity measure dt Xm>i v(dx, n) 5 n , 
where 

v(dx,{n})/dx = 4>(v{x)) v(x) n ( /?l {n , = i } + I A(dz)e~ vix)z - " 



(0,oo) 



(n + l)\ 



Indeed, since (A\,N e ) has the same law as the first atom of this point process conditional on 
its first component being larger than e, it is easy to embed the sequence (A £ ,i > 1) into the 
atoms of this Poisson point process, by truncating atoms larger than those whose multiplicity 
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has not yet been exhausted. However, preliminary calculations indicate that convergence of this 
new point process as e — > is not evident. 



Proof of Theorem 14.21 We need to introduce some notation. For any stochastic process 
W admitting a height process in the sense of 0, we denote by H w its height process, that is, 



where 



Tfrt := inf W r . 

s<r<t 



Further, if W has finite lifetime, denoted T, we denote by p w the great-aunt measure associated 
with W, in the sense that 



<P W J>:= [ d s I s WtT f(H^ -Hf) 
J\0.T] 



'[0,T] 

In particular, if W is the Levy process killed at o~h under N(- \ supH > h), then we know from 
Theorem 13.41 that p w and the trace of p° on [0, h] are equally distributed. Now if p is a positive 
measure on M + and denotes the right-inverse of the nondecreasing function x h- > p([0,x]), 
and if I w denotes the past infimum of the shifted path W — Wo, we denote by <J>(/i, W) the 
generalized height process 

In particular, in the (sub)critical case, H* is distributed as &(p°,X). Finally, recall that k 
denotes the killing operator and 9 the shift operator, in the sense that X o Q t = (X t + S ] s > 0) 
and X o kj is (X sA t',s > 0). For any path (X s ;s > 0) and any positive real number t, if 
W := X o kf and W = X o 0t, then it is not hard to see that for any s > 0, 

H t x +s = <S> s (p w ,W') + HW. (10) 

In particular, applying this to X under N(- \ supH > h) and to t = ah, the strong Markov 
property at yields 

H x oO ffh ±$(p,Xo6 ah ) + h, (11) 

where p is a copy of p° independent of X. 

Now we work conditionally on (A^Nf) = (a,n). Recall that Si = inf{t > : H* = —e} 
and Ti = inf{t > Si : H* = 0}. Set Vi := T\ and define recursively for i = 1, . . . ,n 

Ui := sup{t < Vi : H* = -a}, 

Wt := mf{t > V % : H$ = -a}, 
V i+ i := inf{t > Wi : fl? = 0}. 

Last define 

W := m£{t > : X t = -Y°_} and U n+1 := mf{t > : X t = -Y®}. 
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As seen in the proof of Theorem 14.11 the subpaths e* := (X t +Ui — Xu.',0 < t < W{ — Ui) 
are the n excursions of X above its infimum whose height reaches level a, so that for all 
t G [Woj^n+i]) It G (— Y®_,Y®], and H* = H t — a. An application of the strong Markov 
property yields the independence of the n subpaths e 1 and the fact that they are all distributed 
as N(- | sup if > a). Also X' := (X t+ jj n+1 — Xjj n+1 ;t > 0) is a copy of X independent from 
all the previous subpaths and from Y°. Notice that for all t G [Wo,i7n+i]) H* > —a. As a 
consequence, by continuity of the height process, H* takes the value —a at all points of the 
form Ui and Wi, it hits only in intervals of the form [Vj,Wj), and takes values in [— a, 0) 
on all intervals of the form [Wi, £7j+i]. As a consequence, if we excise all the paths of H* on 
intervals of the form (Wi, f/f+i), i = 0, . . . ,n, we will still get the same coalescent point process. 
Also notice that those paths are independent of the n excursions e l and only depend on Y° 
through AY a °. As a consequence, recalling the notation in Subsection 14.21 if -4 e denotes the 
mapping that takes a height process to its sequence of e-discretized coalescent levels, that is, 
A 6 : H* i-> (A\,A\, • • • ), where A\ , A\, . . . are defined by © then we have 

A £ (H*) = (a, A £ (H(e 1 ) - a), a, ... , A £ (H(e n ~ l ) -a), a, A £ (H' n )) 

where H' n is the concatenation of H(e n ) — a and of Q(p° o 9 a ,X'), writing p° o 8 a for the 
measure associated with the jump process (Y® +a — Y®;s > 0). First observe that, as long 
as the coalescent point process is concerned, we can again excise the parts of each of the n 
subpaths in the previous display going from height —a to height 0. This amounts to considering 
excursions e l only from the first time a a they reach height a. But recall from (|lip that 

H(e) o 0(j a — a = <E>(/9, e o 6 Ua ), which is distributed as the process H* killed upon reaching 
—a. Second, by the same argument as previously, erasing the part of H' n before its first 
hitting time of 0, we get the concatenation of a copy of H* killed upon reaching —a and of 
§(p° o a ,X'), where we remember that X' is an independent copy of X. The result has the 
law of $(/0°, X), i.e., it has the law of H* . In conclusion, this gives the following conditional 
equality in distribution 

A £ (H*) = (a, A £ (H{), a,..., A £ (H^),a, A £ (H'^)), 

where H'^ is a copy of H* and the H* are independent copies of H* killed upon reaching —a, 
all independent of H". 

Now observe that since the law of A\ is absolutely continuous, the branch length a will 
occur exactly n times (in particular, it will not appear in A £ (H^)). Also because all the heights 
between successive occurrences of a are smaller than a, the following conditional equality can 
be inferred from the previous display and the definition of the sequence B £ := (B?) of point 
measures 

B £ = (0, n5 a , (n - l)8 a + B 1 , (n - 2)8 a , ...,5 a + B n ~ l , S a , B% 

where B' is a copy of B £ and the B l are independent copies of the sequence of point measures 
associated with a coalescent point process A 6 killed at its first value greater than a (in the 
usual sense that this value is not included in the killed sequence). In passing, an induction 
argument on the cardinal of the support of B £ shows that A 6 = s(B e ). Also, each B % is reduced 
to a sequence of length 1 with single value 0, with probability ¥(A e > a). Otherwise, it starts 
in the state N5a, where (A,N) has the law of (Af,Nf) conditional on A\ < a. Note that the 
sequence following the state 5 a , which corresponds to the case when s(b) ^ s(b*), has the same 
law as B £ , dropping the dependence upon a. 
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Now assume that Bf = X^j>i n j^aj- The following conclusions follow by induction from the 
last assertion. If n\ > 2 (i.e., a* = a±) and K is the first time after k + 1 that a\ has its multi- 
plicity decreased, then the sequence (Bf; k + 1 < i < K) is independent of (Bf; i < k) and has 
the law of the sequence associated with a coalescent point process killed at its first value greater 
than a\. If ri\ = 1 (i.e., a\ = 02) and K is the first time (after k) that 02 has its multiplicity 
decreased, then the sequence (Bf; k + 1 < i < K) is independent of (Bf; i < k) and has the law 
of the sequence associated with a coalescent point process killed at its first value greater than 
ai. In particular, Bf +1 = Bf, — 5 ai with probability ¥(A £ > a*), and B e k+l = Bf, — S ai + N5a, 
with probability P(v4 e < a|), where (A, JV) has the law of (Af,iVf) conditional on A\ < a\. □ 



4.4 Convergence of the coalescent point process 

We now present the theorem that connects the discrete case coalescent process, based on the 
offspring distribution £ p , to the continuous case coalescent process, based on the associated 
branching mechanism ip. Let us assume, as in Section 13.41 that for some sequence (7 P ,p > 
1)> 7p — 00 as P ~ * 00 an d a sequence of r.v. (£ p ,p > 1) such that the rescaled BGW process 
started at \px\ with offspring distribution £ p when rescaled to converges in Skorokhod 

space to a CSB process Z with branching mechanism ip started at x. 

In order to obtain convergence for the coalescent processes we need to define a discretized 
version of the discrete case process by considering only the individuals whose coalescent times 
are greater than 7 P e for some fixed e > 0. Recall that for a point measure b, s(b) denotes the 
minimum of its support. Start with the sequence of finite point measures (B^ i > 0) whose law 
is given by Theorem 12. 2[ Let tq := 0, and for any i > 1, let 

n := inf{n > n-i : s(B n ) > 7 p e}. 

Define the 7 p e-discretized process of point measures 

(Bf' £ ;i>0) := (B Ti ;i>0). 

Let B = { Y^2=i bi^di : n G N, 6j G N, aj € M+} be the space of all finite point mass measures 
on K_|_, equipped with the usual vague topology. Let W : B h-> B be a function rescaling the 
point mass measures so that 

n n 
i=l i=l 

Since (Bf ,e ;i > 0) is a Markov chain on it is a random element of £K 0,1 ' 2 "'}, equipped with 
the product of vague topologies on B. 

Theorem 4.3 The sequence of rescaled discretized Markov chains (lZ p (Bf' £ ); % > 0) converges 
in distribution on the space gi ' 1 ' 2 ---} to the Markov chain (Bf; i > 0) whose law is given by 
Theorem \4-2\ as p — > 00 . 



33 



Proof. Note that the initial values for the sequence 1Z p (Bq' £ ) as well as for the limit Bq 
are simply null measures. In order to describe the transition law of the discretized process 
(Bf' £ ; i > 0), condition on its value at step i 

B n = b with a\ = sib) > j p e 

Condition further on the values of Tj and Tj + i for the unsampled process k > 1). With 
b* = b — 5 ai , we have that for all 1 < j < Tj+i — Tj — 1 

B Ti+j = b* + ^2 n j^a J with a,j < <y p e, Vj > 1 

since by definition of Tj + i for 1 < j < Tj+i — Tj — 1 the smallest mass in B n+ j must be smaller 
than 7 p e, and in each step only the weight of the smallest mass is decreased. 

On the event r^i = Tj + 1, the transition rule from Theorem 12.21 for step Tj to Tj + i gives 
that, for a\ = s(b*), 

( b* + NP' e 5 AP ,e if AP> e < oj and A*>' £ ^ ai 
Tl+1 ~ \ 6* otherwise ^ J 

where (A p ' £ , N p ' £ ) is distributed as (A±, Q' A _ ) conditional on A\ > 7 p e. The conditioning in this 
law follows from the definition of Tj+i as the first time after Tj for which s(Bff 1 ) > 7 p e. 

On the event Tj+i > Tj + 1, since sf^-Bf^) > 7 P e, by step Tj + i — 1 all of the masses from 
^2j>i n j^a,j mus t have been eliminated except for one mass a that is smaller than -y p e whose 
weight at this step is 1, so 

B n+1 -i = b* + 5 a with a < j p e. 

Since the smallest mass in S T . +1 _i is 5 a the transition rule from Theorem 12.21 for step Tj + i — 1 
to Tj+i gives that B* _ t = B n+1 ^i — 5 a = b* , so s(B* _ ± ) = s(b*) = a\. Also, a new 
mass (A,N) is added only if A < a\ and A ^ a = s(6* + 5 a ). Note that we must also have 
A > 7 P e as in the next step s(B n+1 ) > j p e, so the added mass is again distributed as (A±, £ A ) 
conditional on A\ > ^ p e. Integrating over possible values for B Ti+1 -±, Ti, and Tj+i, we have 
that conditionally given B p ' £ = b the transition rule for Bf^ is 

f B*. + NP' £ 5 A v,z if A p ' £ < a\ and A p ' £ ^ ai 
r i+i | _g* otherwise 

where (^4 P ' £ , iV p ' £ ) is distributed as (A\,£ A ) conditional on A\ > 7 p e. 

For the rescaled process 1Z P (B P ' £ ), the transition rule for 72^(5^) conditional on the value 
of W{B P ' £ ) = W{b) is then 

B p ' £ ) = \ + NP ' £5 ^^ if AP ' £ < a i and ^ ^ ai 

1 4+1 j 1 otherwise 

where we define K p { ai ) := 5{W(b)), K p {b)* := K p {b) - 5 nP{ai) and W{a\) := 5{W{b)*). 
If we can now show convergence in distribution of 



(7- 1 ^ e ,^ £ ) = ( 7p - 1 ^i,a)l^i>7 P ^ — ► (At,NP 

p— >oo 
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then our claim on point process convergence will follow from the description of the transition 
rule for (Bf; i > 1) from Theorem 14.21 and the standard convergence arguments for a sequence 
of Markov chains based on weak convergence of their initial values and transition laws. 

We now express (A%, Q' A ) in terms of the great-aunt measure p^ of a single quasi stationary 
BGW tree with offspring distribution £ p . Consider the (0, 1) individual in our doubly infinite 
embedding of quasi-stationary BGW genealogies. Recall, see Remark [21 that (C' n ', n > 1) is a 
sequence of independent r.v. which conditionally on (p„ , n > 1) are binomial with parameters 
p^and p n _ 1 = P(z£V0|4 p) = l). 

First for A±, take any x > e, then 

1"/p x \ hp x \ hp x \ 

F(A 1>lp x\A 1 > lp e) = H P(£ = 0)=E ]J = Eexp ( £ ln(l-pi_i)) 

Let t^I denote the extinction time of a BGW process started with Z^f 1 = p, then 
1 - p n ^ = P(ZJW = 0|4 P) = 1) = P( T « < n - I) 1 /* 

Let r cxt denote the extinction time of the CSB process Z started with Zq = 1, then the 
assumption lim infp^oo P(Z^ j = 0) > guarantees that whenever 7^ 1 i p ->i we have 

(1 -*VJ P = P(7p < 7 p \ - lv~ X )^J^ <i) = 
Let us define f e,x (-) = v(-)l\ £jX ](') an d a sequence of functions {f p } such that 

= fp'%% 1 -) = -Hi-p._ 1 r x e P x (-) 

where Xp X is a sequence of bounded continuous functions approximating l[r7„e"|,L7i>a:J] converging 
pointwise to l\ etX ](-)- Then, we have that fp^i'J^ip) — > f e,x (i) = v(i)l £ <i< x as p — > oo 
whenever 7j^ 1 i p — > i. By (iii) of Theorem 13.41 it follows that 

1"/p x \ (p) 

£ ft_ln(l_p )P ^_ <p o i > 

* — » p p— s-co 1 J 



showing that 



P(Ai > -f p x\Ai > 7 p e) — ► Eexp (- < p°,vlr e ,i > ) 



p— >oo 



By Definition 13.31 and equation © we have 

Eexp (— < p°, vl[ £ x] > ) = exp -((3 [ duv(u) + [ du [ tt (u) (dr) (l - e~ rv{u) 

V J[e,x] J[e,x] 7 [0,oo ) V 

which together with the results of Theorem 14.11 proves that 7 P _1 Ai|Ai > j p e —tAf. 

We next show that for any y > e the sequence J2]=-y p£ C' converges as p — > oo to a Cox 
point process on [e,y] whose intensity measure given p° is < p G ,vl\ £>y \ >. For any A > 

h P y\ h P y\ h P y\ 

Eexp(-A^ C0=E J] O-Pi-i+Pi-ie-*)"*" =Eexp( £ ln(l- K _ 1+K _ ie - A )) 
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Let us define g £ ' y (-) = v(-)(l — e A )l[ £j2/ ](-) and a sequence of functions {g P y } such that 

flg&O = f/iip 1 -) = -Hi- P ._ 1 + p ,_ 1 e-^Yx £ P y (-) 

Since, whenever ")~ x i p — > i we have 

p ip = P(Z« / |Z « = 1) = 1 - P( T W < g 1 ^ « 1 - P(r cxt < 7p " 1 i p ) 1 / P « c^ipj/p 

it follows that <jr|' y {'J p 1 i P ) — > g e,y (i) = v(i)(l — e~ x )l e<i<y ] as p — > oo whenever r y p ~ l i p — > i. By 
(nil of Theorem 13.41 



L7pJ/J 

Eexp(-Ap- 1 V CO — > Eexp( < p°, wl [e , tf] > (e" A - 1)) 



p— too 
i=\l P e~\ 

showing that (e.g. Theorem 16.29 of |17j ) 

li P y\ 



where 5 is a Cox point process whose intensity measure conditionally on p° is < p°,v >. 
Finally, take e < x' < x < x" < y, such that 7 P (x' — x) — > and 'y p {x" — x) — > as p — > oo. 

P( 7p x' < <7 P x",C^ = n| Ai > 7p e) 

= P( 7? y < Ai < 7 P ^'l M > 7p e) P(C^ = n\ j p x' < A x < lp x") 
— ► ¥{A\ € dx)¥(E [£M (dx) =n\E [etV] (dx) + 0) 

P — tOO L ?f J L i»J / 

Evaluating the probability that the point process with intensity measure < p°,v > takes on 
values n = 1 and n > 2 at height a; gives precisely the formulae given in the Theorem 14. H 
showing that for all n > 1 and x > e 

P(Ai € 7 P dx,CAi = > 7p £ ) — ► p (^i e ^,^i e = n), 

1 p— >oo 

which completes the proof. □ 



5 Two applications in the discrete case 

In this section, we come back to the discrete case to display two further results on the coalescent 
point process. 

5.1 The linear- fractional case 

A BGW process is called linear-fractional if there are two probabilities a and b such that 

/(.) = .+ <izf2az«! seM . 



36 



In other words, £ is product of a Bernoulli r.v. with parameter (1 — a) and a geometric r.v. 
with parameter (1 — 6) conditional on being nonzero. The expectation m of £ is equal to 



so that this BGW process is (sub)critical iff a = b. In general, the coalescent point process 
{A{,i > 1) is not itself a Markov process, but in the linear-fractional case, it is a sequence of 
i.i.d. random variables. An alternative formulation of this observation was previously derived 
in [291. 



Proposition 5.1 In the linear fractional case with parameters (a, b), the branch lengths of the 
coalescent (Ai,i > 1) are i.i.d. with distribution given by 



F(A 1 > n) 



bm n - 

when a^b, and when a = b (critical case), by 

P(Ai >n) = — 3—^ 
v ; na + 1 



Proof. Recall from Theorem 12.11 that Ai = min{n > 1 : Di(n) ^ 0}, so in particular, we 
can set Aq := +oo. We prove by induction on i > 1 the following statement (Si). The 
random variables (Di(n);n > 1) are independent r.v. distributed as Q' n , and independent of 
(Aq, . . . , Ai-i). Observe that (Si) holds thanks to Theorem 12.11 Next, we let i be any positive 
integer, we assume (Si) and we prove that (<Si+i) holds. Elementary calculus shows that Q n has 
a linear-fractional distribution, so that (,' n has a geometric distribution. We are now reasoning 
conditionally given Ai = h. By (Si) and the definition of Ai, we get that conditional on Ai = h, 

• the r.v. (Di(n);n > h) are independent r.v. distributed as (' n , and independent of 
(A ,...,Ai_i) 

• Di(h) has the law of Q' h conditional on (' h ^ 0, and it is independent of (Di(n);n > h) 
and (A Q , . . . ,Ai-{) 

• Di(n) = for all n < h. 

Let us apply the transition probability defined in the theorem. First, Di + i(n) = Di(n) for all 
n > h, so the r.v. (Di + i(n);n > h) are independent r.v. distributed as Q' n , and independent of 
(Aq, . . . , Ai-i). Second, Di + i(h) = Di(h) — 1 has the law of Q' h — 1 conditional on Q' h ^ 0, which 
is the law of C,' h because C,' h is geometrically distributed, and it is independent of (Di(n); n > h) 
and (Aq, . . . , A^i). Third, the r.v. (Di + i(n); n < h) are new independent r.v. distributed as 
Cn- As a consequence, conditional on Ai = h, the r.v. (Di + \(n);n > 1) are independent r.v. 
distributed as (' n , and independent of (^4cb • • • j-^i-i)- Integrating over h yields (Si + i). 

We deduce from (Si) that Ai is independent of (Aq, . . . , ^4j_i) and is distributed as A\. The 
computation of the law of Ai stems from well-known formulae involving linear-fractional BGW 
processes (see [2]), namely 

(1 - (a/b))(l - s) 
Jn{ ' m~ n (s - (a/b)) + 1 - s 
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if a 7^ b, and 

. . na — (na + a — l)s 

Jn\ s ) = —. : 

1 — a + na — nas 

when a = b. Indeed, it is then straightforward to compute 

m = m- n J b ~ a) " and 1 - /„(0) - 



(b — am n ) 2 b — am 

if a ^ b, whereas 

fnM= „ {1 ~ a) \, and 1-/ B (0)- 



(1 — a + na) 2 1 — a + na' 

when a = b. Thanks to Theorem 12 .1\ the ratio of these quantities is ¥(Ai > n). □ 



5.2 Disintegration of the quasi-stationary distribution 

In this subsection, we assume that f'(l) < 1 (subcritical case). Then it is well-known [2] that 
there is a probability (ctk)k>i with generating function, say a, 

a(s)=^2a k s k 8 6(0,1], 
fe>l 

such that 

lim ¥(Z n = k | Z n > 1) = a k k > 1. 

This distribution is known as the Yaglom limit. It is a quasi-stationary distribution, in the 
sense that 

^k{Zi =j\Z l7 ^0) = aj j > 1. 

fe>i 

Set 

U := min{i > 1 : Ai = +oo}. 

Then for any i < U < j , the coalescence time between individuals (0, i) and (0, j) is m&x{Ak : 
i < k < j} = +oo, so that (0, i) and (0, j) do not have a common ancestor. Now set 

V := max{^l fc : 1 < k < U}, 

the coalescence time of the subpopulation {(0,i) ■ 1 < i < U} (where it is understood that 
max0 = 0), that is, —V is the generation of the most recent common ancestor of this subpop- 
ulation. We provide the joint law of (U, V) in the next proposition. 

Proposition 5.2 The law of V is given by 

mfrr \ 1 ~ /n(°) «1 ^ n 

p(F -" ) = ai rpr = n^TT^o) "- a 

Conditional onV = n> 1, U has the law of Z n conditional on £ n > 2. In addition, U follows 
Yaglom 's quasi- stationary distribution, which entails the following disintegration formula 

a(s) = P(V = 0)s + J3P(V = n)E(s z ™ | Cn > 2) s 6 [0, 1]. 

n>l 
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Remark 7 In the linear fractional case with parameters (a,b) (a > b, subcritical case), the Ya- 
glom quasi- stationary distribution is known to be geometric with failure probability b/a. Thanks 
to Proposition \5.1[ the branch lengths are i.i.d. and are infinite with probability 1 — (b/a). Then 
thanks to the previous proposition, the quasi- stationary size U is the first i such that Ai is in- 
finite, which indeed follows the aforementioned geometric law. 



Proof. From the transition probabilities of the Markov chain (Di]i > 1) given in Theorem 
12.1^ we deduce that 

V = max{n : D x {n) / 0}, 

so that, thanks to Theorem 12.11 

n fc >iP(Cfc = 0) _ F(A 1 = +oo) 



F{V <n) = n fc > n+1 p(c£ = 0) 



II£ =1 P(C£ = 0) F(A l > n) 
Now since 

and because this last quantity converges to ¥(A\ = +oo) = ct\ as n — > oo, we get the result for 
the law of V. 

Recall that (— n, Cli(n)) is the ancestor, at generation —n, of (0, 1), so that the total number 
of descendants T n := Z^ n ' ai ^>{n) of (— n, ai(n)) at generation has the law of Z n conditional 
on Z n 7^ 0. Now since no individual (0, j) with j > U has a common ancestor with (0, 1), we 
have the inequality T n < U. On the other hand, (0, £7) and (0, 1) do have a common ancestor 
(all coalescence times (Ai \ 1 < i < U — 1) are finite), so that there is n such that T n , = U. 
Since the sequence (T n ) is obviously nondecreasing, it is stationary at U, that is, 



lim T n = U a.s. 

n— >oo 



Since T n has the law of Z n conditional on Z n ^ 0, U follows the Yaglom quasi-stationary 
distribution. 

Actually, since — V is the generation of the most recent common ancestor of (0, 1) and 
(0, U), we have the following equality 

U = T V . 

Now recall that V = max{ra : D\(n) ^ 0}. By definition of D±, we can write {V = n} = E n f]F n , 
where E n := {D\(k) = 0, Vfc > n} and 

F n := {D 1 (n) > 1} = {#V(n, 1) > 2}. 

Now observe that E n is independent of all events of the form F n n {T n = k} (it concerns the 
future of parallel branches), so that P(T n = k \V = n) = P(T n = k \ F n ). In other words, T n 
conditional on V = n has the law of Z' n conditional on > 2, where Z' n is Z n conditional on 
Z n ^ 0. Since {Z n ^ 0} = {Cn > 1}, we finally get that conditional on V = n, U has the law 
of Z n conditional on Q n > 2. □ 
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